google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.56k stars 319 forks source link

[Question] Can I use speaker annotated datasets in other language rather than English? #44

Closed muntasir2000 closed 5 years ago

muntasir2000 commented 5 years ago

Describe the question

Hi, This might be a naive question. But can I use speaker annotated speech corpora from different languages (English, Mandarin etc), combine them and train the speaker embedding component? Are speaker embeddings/UIS-RNN language independent?

My background

Have I read the README.md file? yes

Have I searched for similar questions from closed issues? yes

Have I tried to find the answers in the paper Fully Supervised Speaker Diarization? yes

Have I tried to find the answers in the reference Speaker Diarization with LSTM? yes

Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification? yes

wq2012 commented 5 years ago

@muntasir2000 Yes! And we are already doing that ourselves.

In the paper "Generalized End-to-End Loss for Speaker Verification", we described a technique called MultiReader. We had been using MultiReader to train a single more for multiple languages (currently 8 langauges).

In the paper "Fully Supervised Speaker Diarization", in Section 4.1, we also mentioned that our d-vector V2 and V3 models are trained with non-English data, using MultiReader.