Open dragen1860 opened 2 years ago
hi, dragen1860, the feature extraction of Speechbrain is indeed different from the inference code, so I opened a new branch and revised this part, you can refer to here: https://github.com/zycv/speechbrain/blob/OpenSpeaker/speechbrain/lobes/features.py#L139 Considering that these revisions are a long time ago, please leave a message if you have any questions.
Dear author: I try to compare your fbank and torchaudio fbank. For an input with shape [1, 16000], the output of yours is [1, 98, 80], however, the torchaudio get [101, 80]. I guess some of padding is different between yours and torchaudio. Could you give some tips on how to align these two implementations? thank you very much.