Closed saumyaborwankar closed 2 years ago
What is the issue?
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.
You only need to use one of those PLDA models for your system. Also, if you have enough in-domain training data, you'll have better results training a new PLDA model. If your data is wideband microphone data, you might even have better luck using a different x-vector system, such as this one: http://kaldi-asr.org/models/m7. It was developed for speaker recognition, but it should work just fine for diarization as well.
In the egs/callhome_diarization, we split the evaluation dataset into two halves so that we can use one half as a development set for the other half. Callhome is split into callhome1 and callhome2. We then train a PLDA backend (let's call it backend1) on callhome1, and tune the stopping threshold so that it minimizes the error on callhome1. Then backend1 is used to diarize callhome2. Next, we do the same thing for callhome2: backend2 is developed on callhome2, and evaluated on callhome1. The concatenation at the end is so that we can evaluate on the entire dataset. It doesn't matter that the two backends would assign different labels to different speakers, since they diarized different recordings.
Regarding the short segment, I think the issue is that your SAD has determined that there's a speech segment from 24.99 to 25.43 and a separate speech segment starting at 25.51. It might be a good idea to smooth these SAD decisions earlier in the pipeline (e.g., in your SAD system itself) to avoid having adjacent segments with small gaps between them. Increasing the min-segment threshold might cause the diarization system to throw out this segment, but to me it seems preferable to keep it, and just merge it with the adjacent segment. But this stuff requires a lot of tuning to get right, and it's hard to say what the optimal strategy is without playing with the data myself.
By the way, what is this "nasa_telescopes" dataset you're using?
Originally posted by @david-ryan-snyder in https://github.com/kaldi-asr/kaldi/issues/2523#issuecomment-409597254