Closed glynpu closed 7 years ago
pyannote-speech-detection
can be used to do "voice activity detection" (VAD)
pyannote-change-detection
can be used to do "speaker change detection" (SCD)
Both tasks can be seen as 2-class classification problems and are addressed using the same LSTM approach described in https://github.com/yinruiqing/change_detection/blob/master/doc/change-detection.pdf
The only difference is in how the groundtruth sequence is defined:
VAD : class 0 = non-speech, class 1 = speech SCD : class 0 = no speaker change, class 1 = speaker change
However, in practice, SCD is implemented as a 1-dimensional regression task with class 1 only -- hence the n_classes = 1.
Sorry I don't have time to provide more details...
Hi, Bredin,
I am a newer to LSTM. After reading your paper(in the citation) about tristounet, I think I have got the basic idea how it works for speaker change detection.However, I still confused about the theory background for command: pyannote-speech-detection.
Can I say that speech detection is similar to speaker turns detection if non-speech segments are considered from a special 'speaker' while speech parts are from another 'speaker'. In this case, speech boundary detection is the same to speaker change detection.
However, I am still wondered why the parameter setting in config.yml ( n_classes: 2 ) for speech activity detection is different with the setting in config.yml ( n_classes: 1 ). To be honest, I don't know the meaning of this parameter(n_classes). Could I have other introduction or tutorials about the theory background for this pyannote-speech-detection command?
Thank you for your time and patience.
Liyong Guo