zycv / OpenSpeaker

OpenSpeaker is a completely independent and open source speaker recognition project. It provides the entire process of speaker recognition including multi-platform deployment and model optimization.
Apache License 2.0
61 stars 12 forks source link

padding ? #6

Open dragen1860 opened 2 years ago

dragen1860 commented 2 years ago

Dear author: I try to compare your fbank and torchaudio fbank. For an input with shape [1, 16000], the output of yours is [1, 98, 80], however, the torchaudio get [101, 80]. I guess some of padding is different between yours and torchaudio. Could you give some tips on how to align these two implementations? thank you very much.

zycv commented 2 years ago

hi, dragen1860, the feature extraction of Speechbrain is indeed different from the inference code, so I opened a new branch and revised this part, you can refer to here: https://github.com/zycv/speechbrain/blob/OpenSpeaker/speechbrain/lobes/features.py#L139 Considering that these revisions are a long time ago, please leave a message if you have any questions.