Closed chrisspen closed 5 years ago
you first have to train an embedding model on all your speakers ahead of time
The speaker "embedding" model (better call it "encoder" model, which makes more sense) is trained on a totally different dataset that does not contain the speakers which you want to diarize. This speaker encoder model is independent from any actually speakers. It is the "universal" model. Its input is audio, and its output is the speaker embedding of this audio.
I assume you can still comment on a closed issue, right?
Yes. I can now.
It seems, in order to use uis-rnn, you first have to train an embedding model on all your speakers ahead of time, such as with a tool like this. However, how do you use uis-rnn to identify new or unknown speakers if it's limited to a pre-existing set? Does it maintain any sort of universal background model which it can use to compare and match unknown speakers against?