google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.55k stars 320 forks source link

uis-rnn error is too high. #63

Closed wychoi44 closed 4 years ago

wychoi44 commented 4 years ago

Describe the question

The d-vector embeddings are trained on TIMIT corpus (EER=4.17%) using the github(https://github.com/HarryVolek/PyTorch_Speaker_Verification). For UIS-RNN experiments, I trained on Switchboard corpus, and tested on Callhome American English dataset as mentioned in the paper, "Fully supervised speaker diarization." However, DER was 30.1%, which is far from the results in the paper(11.7%). DER on the toy data is pretty good(0.4%). I used the same configurations provided. What is the problem?

My background

Have I read the README.md file?

Have I searched for similar questions from closed issues?

Have I tried to find the answers in the paper Fully Supervised Speaker Diarization?

Have I tried to find the answers in the reference Speaker Diarization with LSTM?

Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification?

wq2012 commented 4 years ago

@wychoi44

First, as we stated in the REAME.md file, we are not responsible for the correctness of any 3rd party implementations of the GE2E paper.

Besides, we never use TIMIT for training speaker recognizer model. VoxCeleb 1 and 2 are known to have better variations and quality (and they are publicly available). Our own model is based on an internal dataset of 30M+ utterances from 100K+ speakers.

Also, you always need to retune the uis-rnn parameters if you use a different speaker recognition model.