qiuqiangkong / torchlibrosa

MIT License
450 stars 45 forks source link

Changed defaults for mel spectrogram filters. #2

Closed RicherMans closed 3 years ago

RicherMans commented 3 years ago

Hey 秋强, I am a bit late about the pull request, but as I meant to say last week, I would suggest changing the defaults from your filters to be comparable to librosa ones. I think you have chosen the current defaults (fmin = 50, max = 14000) due to your experiments. This pull request changes the defaults of fmin=0 and fmax=sr//2. It might break some previous code that used your defaults.

Reasons why I think the change is necessary:

For some people, which use librosa as their front-end features, the change would allow them to switch training and evaluation front-end libraries. For example: Training is done by extracting features normally in librosa, but during the evaluation, where one might need GPU accelerated features, this can be achieved by this code. For me, this is an actual use-case, since storing raw-audio consumes much more space than extracted log-mel-spectrograms.

I wrote an additional test to check if both extracted (default) features between librosa and torchlibrosa are nearly equal.

Also thanks for the recent talk!

Henri (丁翰林)