Handling unknown speakers?

google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

https://arxiv.org/abs/1810.04719

Apache License 2.0

1.56k stars 319 forks source link

Handling unknown speakers? #64

Closed chrisspen closed 5 years ago

chrisspen commented 5 years ago

It seems, in order to use uis-rnn, you first have to train an embedding model on all your speakers ahead of time, such as with a tool like this. However, how do you use uis-rnn to identify new or unknown speakers if it's limited to a pre-existing set? Does it maintain any sort of universal background model which it can use to compare and match unknown speakers against?

wq2012 commented 5 years ago

you first have to train an embedding model on all your speakers ahead of time

The speaker "embedding" model (better call it "encoder" model, which makes more sense) is trained on a totally different dataset that does not contain the speakers which you want to diarize. This speaker encoder model is independent from any actually speakers. It is the "universal" model. Its input is audio, and its output is the speaker embedding of this audio.

wq2012 commented 5 years ago

I assume you can still comment on a closed issue, right?

chrisspen commented 5 years ago

Yes. I can now.