pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.42k stars 790 forks source link

Order and dependency for speaker change detection #127

Closed PiotrTa closed 5 years ago

PiotrTa commented 6 years ago

Hi, I am trying to build a speaker change detection algorithm based on pyannote and my own dataset. Do I understand correctly that I could either use speaker-change-detection directly or speaker-embedding (+distance between two sliding windows + threshold + ...)?

Is there any dependecy during training between those two methods? Do they use the previously trained speech activity detection while training or they base on annotations?

Thank you for the answers.

hbredin commented 6 years ago

You are correct: one can use speaker change detection directly [1] or distance between speaker embeddings of two adjacent sliding windows [2]. Both approaches need a thresholding step at the end.

As long as your dataset comes with "who speaks when" annotations, you do not need to train a speech activity detection first.

[1] R. Yin and H. Bredin and C. Barras. Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks. Interspeech 2017 [2] H. Bredin. TristouNet: Triplet Loss for Speaker Turn Embedding. ICASSP 2017

PiotrTa commented 6 years ago

So if I now have a new test file without any annotation and would like to apply change detection on it, do I have to start with speech activity detection first for labelling speech regions?

hbredin commented 6 years ago

Yes.

At one point, I would like to train a network that does both at once, though.