meronym / speaker-transcription

Transcription with speaker diarization pipeline
MIT License
79 stars 17 forks source link

Question about embedding model choice #12

Open naripok opened 3 months ago

naripok commented 3 months ago

Hey you, thank you for the package :)

I'm researching around how to improve diarization errors related to overlapping speech, and I'd like to ask you about your choice of a embedding model.

Is there any particular reason for you to pick the speechbrain's model instead of the default pyannote's model for speaker embedding in your pipeline?

From my research, the speechbrain ecapa-tdnn model gets 0.8% EER for speaker verification in the Voxceleb benchmark, while the wespeaker resnet34-LM models provided by pyannote gets 0.74% EER in the same benchmark. Is there a big difference between their performance for diarization that I'm not aware of? Or is there any other reason for choosing one over the other?

Again, thank you for the code and for the info!

EDIT: I just found that pyannote's pipeline v2.0 requires speechbrain as a dependency. Was it the default back there? Sorry for the stupid question if it was :sweat_smile:.

MaximeDde commented 2 months ago

Hi @naripok , not an exact answer, but from what I've been testing, speechbrain's model seems to have better accuracy with other languages than english (I have been using it specifically for french transcription, and its diarization accuracy is much higher than pyannote's 3.1 speaker diarization).

Hope that helps slightly in how a model may be chosen !