pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.38k stars 784 forks source link

Citation of "Binarize predictions using onset/offset thresholding" #82

Closed fhahaha closed 6 years ago

fhahaha commented 6 years ago

Hi Bredin,

I want to cite your idea of "Binarize predictions using onset/offset thresholding" for speech activity detection. I think this is quite interesting comparing to a single threshold.

However, I failed to find introduction about that in two of your papers "Tristounet: Triplet Loss for Speaker Turn Embedding" and "Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks".

Could you tell me which paper of yours contains introduction about this idea so that I could cite in my work?

Many thanks.

hbredin commented 6 years ago

This is not something I invented.

This is similar to « hysteresis thresholding » usually applied in image processing (e.g. for edge detection).

This has been used before by @GregGovit for SAD in https://pdfs.semanticscholar.org/6c15/74eff91ae3df8d8b025d72eae7dc5dd2e26d.pdf with even more parameters (padding, minimum speech/non-speech duration, etc.)

fhahaha commented 6 years ago

wow! Thank you