Closed wychoi44 closed 5 years ago
@wychoi44
First, as we stated in the REAME.md
file, we are not responsible for the correctness of any 3rd party implementations of the GE2E paper.
Besides, we never use TIMIT for training speaker recognizer model. VoxCeleb 1 and 2 are known to have better variations and quality (and they are publicly available). Our own model is based on an internal dataset of 30M+ utterances from 100K+ speakers.
Also, you always need to retune the uis-rnn parameters if you use a different speaker recognition model.
Describe the question
The d-vector embeddings are trained on TIMIT corpus (EER=4.17%) using the github(https://github.com/HarryVolek/PyTorch_Speaker_Verification). For UIS-RNN experiments, I trained on Switchboard corpus, and tested on Callhome American English dataset as mentioned in the paper, "Fully supervised speaker diarization." However, DER was 30.1%, which is far from the results in the paper(11.7%). DER on the toy data is pretty good(0.4%). I used the same configurations provided. What is the problem?
My background
Have I read the
README.md
file?Have I searched for similar questions from closed issues?
Have I tried to find the answers in the paper Fully Supervised Speaker Diarization?
Have I tried to find the answers in the reference Speaker Diarization with LSTM?
Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification?