Closed kan-cloud closed 3 years ago
As long as you have labelled waveforms of non-overlapping speakers, you could use the AMI dataset for speaker embedding but it has a limited number of speakers so model trained from scratch on such a dataset isn't going to be very good. It's generally better to use another dataset like voxceleb to train speaker embedding on.
As long as you have labelled waveforms of non-overlapping speakers, you could use the AMI dataset for speaker embedding but it has a limited number of speakers so model trained from scratch on such a dataset isn't going to be very good. It's generally better to use another dataset like voxceleb to train speaker embedding on.
Oh I see why it was trained on voxceleb now. Thank you for the prompt reply!
In the tutorial, the AMI dataset is used to train speech activity and change detection. However, the voxceleb data set is used to train speaker embedding. Does the speaker embedding model necessarily require a different dataset than speaker activity and change detection?
I have trained all aspects of my diarization pipeline (sad,scd,emb) on the same dataset (which is split into train, development, and test subsets) and I am getting very poor results. I was wondering if it was because I do not have a separate data set for speaker embedding.