kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.27k stars 5.32k forks source link

Speaker diarization for test wav files while using pre-trained model from callhome_v2 #4400

Closed talal-sen closed 3 years ago

talal-sen commented 3 years ago

Hi, I am still new to Kaldi. I would like to perform diarization on some of the speech samples from my own dataset which do not have any speaker labels available, so I would have to listen and compare it to what the diarization outputs. I have a questions on this:

a) Does it make sense to use a pre-trained model, such as the callhome_v2 model, as there maybe different recording conditions, dialect and possibly language? Or are we assuming that the pretrained model has learned generelizable features (xvectors) so to be able to work well even on an unseen dataset?

Thanks in advance

desh2608 commented 3 years ago

As long as the x-vector extractor is trained on a large enough corpus of speakers, it can be used for extraction. We often use x-vector extractors trained on VoxCeleb (augmented with noise and reverb) for diarization of other datasets (like AMI, etc.) and they seem to work well enough. If you are using PLDA-based backend, you could train/adapt the PLDA to your in-domain data, but such adaptation is not generally required for the embedding extractor itself. The only thing to be careful about, IMO, is that if you have wideband audio, you should use an extractor trained on wideband data (and likewise for narrow-band).