Kaldi Fbanks - Githubissues

wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Apache License 2.0

594 stars 102 forks source link

Hi, thanks for repository! I have 2 questions:

Is there any specific reason why kaldi-fbanks was used to train/inference model? Did you tried other realizations, like from the torchaudio itself MelSpectrogram?
I understand that kaldifeat realization is the fastest open-source realization, but anyway it's slow down whole pipeline, even it's work on GPU, because of all this memory transactions between 2 models in Triton Server. I think fastest way to do it is to write a realization of fbanks on pure pytorch, convert it to ONNX and at the moment when we export speaker verification model to onnx, combine fbank onnx model and speaker verification onnx model. It would prevent all unnecessary memory movements in Triton Server.

wenet-e2e / wespeaker