Open yangb05 opened 1 year ago
Hmm, I remember disabling it because I found the reverse to be true on some systems. I think the best way forward would be to expose the control over this to the user. I'll aim to make a PR to enable this later as I was recently refactoring some of this code, it should be easily doable.
Regarding 48kHz vs 16kHz, I'm not sure I got your point. OPUS is always decoded to 48kHz even if the original audio had smaller sampling rate, unless I missed something.
Regarding 48kHz vs 16kHz, I'm not sure I got your point. OPUS is always decoded to 48kHz even if the original audio had smaller sampling rate, unless I missed something.
For example, I have a .opus file in my dataset, if I use torchaudio.info() to get the sampling rate, it shows 16kHz. Also, if I use ffmpeg to read it, the information shows the input sampling rate is 16kHz. If the param _force_opus_samplingrate is not passed to read_opus_ffmpeg, then the number of samples will be read in 16kHz(actual) while with the sampling rate 48kHz(default) in the recording. Assume read_opus_ffmpeg reads 30,000 samples in this .opus file, and the recorded sampling rate is 48kHz. When I try to resample it to 16kHz in the cut set, the recorded number of samples will reduced to 10,000 from 30,000. Now,
The recorded info: {sampling rate: 16kHz, num_samples: 10000}
The actual info: {sampling rate: 16kHz, num_samples: 30000}
It will cause a mismatch in the subsequent computations.
If the file has 16kHz, that makes sense. I just never encountered an OPUS file that actually has a sampling rate other than 48kHz, even when I encoded WAV data into OPUS that had a smaller SR...
I think your proposed changes make sense, could you make a PR?
OK.
I'am trying to process a large dataset with .wav and .opus files recently, and found that the processing of .wav files is nearly 6 times faster than the processing of .opus files, specifically in the generation of recordings and supervisions. After debugging, I found the difference is that .wav file is processed with torchaudio and .opus file is processed with ffmpeg. The read_opus function in lhotse/audio/backend.py is:
Althought the note says ffmpeg is faster, but in my case, torchaudio is better. I just use the read_opus_torchaudio in the above code, then the speedup appears. pytorch: 1.13 ffmpeg: torchaudio:
Also, there is another problem when using the read_opus_ffmpeg function:
It assumes all the .opus files have sampling_rate 48000,that will be a problem if the dataset is not so normal, for example, in my case, it could be 16000. Then, the recorded sampling_rate will be 48000 while the file is read with actual sampling_rate 16000 if the force_opus_sampling_rate is not specified, which will affect the following computation of num_samples and features. I think just set the cmd with '-ar sampling_rate ' will solve the problem, for example: