Closed turian closed 1 year ago
Digging into this a bit more. The audio I'm using has samplerates 48000 and 44100.
The bug appears to be that resample
happens after padding / trimming in data.py
.
BTW, torchaudio writes:
"transforms.Resample precomputes and reuses the resampling kernel, so using it will result in more efficient computation if resampling multiple waveforms with the same resampling parameters."
@lucidrains I started to write a PR, but because of the bug I didn't understand what the desired behavior is, if there are multiple sample-rates specified:
A few comments would be helpful :)
@turian hey Joseph! thanks for identifying this issue
put in a fix
and your other point is a good one, let me make sure one can specify a different target max length per resample frequency as well
@turian ok done, let me know if 0.4.6 works!
@lucidrains Okay it works! That's wonderful. Thanks Phil
Do you mind adding a comment clarifying about the intended batch shape when there are multiple target sample rates defined?
The code appears to be able to handle audio of varying sizes. Indeed, librispeech contains audio of different lengths.
However, when I run on a corpus of mixed size audio, I get the following error: