hitachi-speech / EEND

End-to-End Neural Diarization
MIT License
360 stars 57 forks source link

Experiment results on multiple speakers dataset #6

Closed mcompute closed 2 years ago

mcompute commented 4 years ago

Great paper! I enjoy reading it and like the idea of having a simple model to solving speaker diarization problem.

I do noticed that your model can classify multiple speakers and, wonder if you have benchmark your model performance against state-of-the-art techniques on dataset with more than 2 speakers. Appreciate if can you share the experiment results on dataset with larger set of speakers. :-)

priyankagutte commented 4 years ago

Hi.. Is pre-trained model available for this?

yubouf commented 4 years ago

Sorry for not respond it. We have done experiments with more than 2 speakers, using CALLHOME dataset. https://arxiv.org/abs/2005.09921 https://arxiv.org/abs/2006.01796 Some implementations based on these two papers will be available.

@priyankagutte we are thinking of providing the pretrained model, but it should be trained with free datasets. Unfortunately, no good models are available for this purpose so far.

875441459 commented 4 years ago

Hi, So can we use a multi-talker dataset to train a 2-speaker diarization system? (Extracting everyone's utterance and the mix them in a way like papers do)

sw005320 commented 4 years ago

Yes. The recipe directory in the repository also provides a script of how to mix them.

875441459 commented 4 years ago

OK, thanks for your replying~