Silence detection (VAD)

mravanelli / pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

2.37k stars 446 forks source link

Silence detection (VAD) #221

Closed matthewkperez closed 4 years ago

matthewkperez commented 4 years ago

Hello, I currently want to train a VAD alongside the DNN senone classifier. My current thought is to create a custom loss function in which all the pdf-ids which map to silences are read as 0s and all the non-silence pdf-ids are real as 1s (for binary classification for the vad).

Is there a way to get the corresponding phone for each pdf-id?

TParcollet commented 4 years ago

Hi, Sorry for the late reply, we are truly busy with the new project, SpeechBrain.

This is definitely not a feature that can be extracted from what we implemented in Pytorch-Kaldi. What I mean by this is that you might need to do some Kaldi call to obtain the corresponding phoneme w.r.t to a pdf-id, although you should ask this in the kaldi google-group to obtain the fastest solution (Since it might affect the training time).