Closed fhahaha closed 6 years ago
This is not something I invented.
This is similar to « hysteresis thresholding » usually applied in image processing (e.g. for edge detection).
This has been used before by @GregGovit for SAD in https://pdfs.semanticscholar.org/6c15/74eff91ae3df8d8b025d72eae7dc5dd2e26d.pdf with even more parameters (padding, minimum speech/non-speech duration, etc.)
wow! Thank you
Hi Bredin,
I want to cite your idea of "Binarize predictions using onset/offset thresholding" for speech activity detection. I think this is quite interesting comparing to a single threshold.
However, I failed to find introduction about that in two of your papers "Tristounet: Triplet Loss for Speaker Turn Embedding" and "Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks".
Could you tell me which paper of yours contains introduction about this idea so that I could cite in my work?
Many thanks.