pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
5.45k stars 724 forks source link

False Alarms vs Misses #1584

Open picheny-nyu opened 6 months ago

picheny-nyu commented 6 months ago

I have a diarization application in which I prefer to have fewer false alarms at the expense of more misses. Can this be controlled during fine tuning?

Thanks Michael

github-actions[bot] commented 6 months ago

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

This is an automated reply, generated by FAQtory

hbredin commented 6 months ago

Not in 3.x, no.

I am considering adding back the option but cannot provide an ETA though.

Can you say more about your use case?

picheny-nyu commented 6 months ago

Using the output to identify sections of speech in parent-toddler conversations to transcribe as input for unsupervised speech recognition fine-tuning. Figure better to miss questionable segments than train on false alarms.

hbredin commented 6 months ago

I would then use pyannote/segmentation for this purpose, wrapped in a voice activity detection pipeline that comes with onse/offset thresholds:

https://huggingface.co/pyannote/segmentation#voice-activity-detection

picheny-nyu commented 6 months ago

Thanks. I do need diarization, though - I want to process the adult and toddler speech separately. Would you suggest I just use a downleveled version of the diarization pipeline that still uses VAD?

hbredin commented 6 months ago

Yes.

stale[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.