Question about embedding model choice

Hey you, thank you for the package :)

I'm researching around how to improve diarization errors related to overlapping speech, and I'd like to ask you about your choice of a embedding model.

Is there any particular reason for you to pick the speechbrain's model instead of the default pyannote's model for speaker embedding in your pipeline?

From my research, the speechbrain ecapa-tdnn model gets 0.8% EER for speaker verification in the Voxceleb benchmark, while the wespeaker resnet34-LM models provided by pyannote gets 0.74% EER in the same benchmark. Is there a big difference between their performance for diarization that I'm not aware of? Or is there any other reason for choosing one over the other?

Again, thank you for the code and for the info!

EDIT: I just found that pyannote's pipeline v2.0 requires speechbrain as a dependency. Was it the default back there? Sorry for the stupid question if it was :sweat_smile:.

meronym / speaker-transcription

Question about embedding model choice #12