vimalmanohar / old-kaldi-git

This is not the official kaldi repository. It is better to fork https://github.com/kaldi-asr/kaldi or https://github.com/vimalmanohar/kaldi instead.
Other
33 stars 34 forks source link

diarization with PLDA #6

Closed gorinars closed 7 years ago

gorinars commented 8 years ago

I am trying to adopt your diarization recipe from asr-diarization branch kaldi-git/egs/aspire/s5/local/run_diarization.sh

Just curious is there some specific reason PLDA and automatic number of speakers detection was not used? The code says about speaker-diarization_v2 to be used with that, but not sure if there is a corresponding binary.

vimalmanohar commented 8 years ago

I think it wasn't giving a good performance because there was a mismatch between training and test speakers in this case. Also in most cases the number of speakers was 2 or 3.

gorinars commented 8 years ago

I am just interested in an option to automatically decide the number of speakers. Seems that with conventional k-means this feature is not supported. Is it possible to commit this speaker-diarization_v2 somewhere? Would really appreciate it

vimalmanohar commented 8 years ago

I do not seem to have that binary anymore. Perhaps @david-ryan-snyder, who wrote that binary might have it somewhere.

david-ryan-snyder commented 8 years ago

In Vimal's branch, I think diarization refers to a more general procedure that is closely related to speech activity detection. Speaker diarization, is not something we've developed a good, or even decent, recipe for. However, we (most @mmaciej2) recently started working on it, and we hope to have a simple recipe in about a month or two. If you saw references to speaker speaker-diarization_v2, it was probably a very simple binary that just clusters some ivectors given a fixed number of speakers.

I suggest keeping an eye out for in in the next month or two.

gorinars commented 8 years ago

Thank you. Will keep an eye as you suggest then. Good luck with this stuff