thomasmol / cog-whisper-diarization

Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote
https://replicate.com/thomasmol/whisper-diarization
153 stars 44 forks source link

Diminution of the speaker diarization precision #8

Closed MaximeDde closed 4 months ago

MaximeDde commented 5 months ago

Hello Thomas, as usual, thank you for your work.

Quite a simple "review", I am sorry for not being able to provide more details, but I've noticed a decrease in the model's capacity to distinguish correctly the speakers...

I've seen a raise of complaints related to bad speaker-diarization of about 60% in the last two months - is it something related to Whisper-v3, or maybe are there parameters in the diarization of pyannote that changed...? And if so, do you think you could expose those parameters on the model ? That could be useful to better fine-tune it !

Thank you again for your huge work - it is extremely useful to me in building ! Regards

thomasmol commented 5 months ago

Can you point to the raise of complaints? Also, you can fine-tune Pyannote and use your own models as well. The pipeline of this repo is just faster-whisper + pyannote so it should be not too complex to build your own pipeline with fine-tuned models

MaximeDde commented 5 months ago

It's mostly complaints I get from the people testing my tool, that I'm centralizing and numbering ! Well yes, I guess I may give it a try, I'm just not yet very confident with this haha... I'll give it a go, and keep you posted if I find some configuration that works well for me !