google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.55k stars 320 forks source link

handle overlapped speech #18

Closed PES2g closed 5 years ago

PES2g commented 5 years ago

In your paper, during evaluation, you exclude overlapped speech. Which one below is the solution?

  1. You method process the whole audio, ignore the error during the overlapped speech
  2. You method first trim the overlapped part from the audio, then process the trimmed audio

And during training, which is the solution ?

wq2012 commented 5 years ago

Our method is consistent for both training and testing:

  1. Process the entire audio to produce speaker embeddings. (we don't want to run speaker embedding LSTM on manually trimmed/concatenated audios)
  2. Remove the speaker embeddings that correspond to overlapped speakers, in both training and testing data.
  3. For computing diarization errors, note that the overlapping speech has been removed from both the numerator and the denominator.
chienducnguyen commented 4 years ago

How you know what speaker embeddings that correspond to overlapped speakers to remove?

wq2012 commented 4 years ago

@chienducnguyen It's from the ground truth. The ground truth has segments labelled with two speakers.