modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Apache License 2.0
1.2k stars 101 forks source link

[ENG] CAM++ EER Results on VoxCeleb-O/E/H datasets #149

Open K1ndWha1e opened 1 week ago

K1ndWha1e commented 1 week ago

Hello! I've a question about EER results that you've got in research paper. My question addressed only for english version of CAM++. (Because with chinese version all results look right) First of all, I want tell you, why I'm asking. I'm making research for my PHD (exploring NN for speaker verification/identification task). So, I've tested your method on my dataset and got EER = 23.79% (look at pic. below ).

CAM++_ENG

I thought that was my mistake in parametrization or something like that, but everything was right (yes, I've checked how you getting fbank from kaldi). I decided to check FAR and FRR on VoxCeleb-O/E/H and I've next result (threshold = 0.7), and I've got next results. FAR/FRR in %: VoxCeleb-O VoxCeleb-E VoxCeleb-H
0.0%/64.95% 0.0%/65.45% 3*10^-4%/65.31%

Results in the table are similar to my results with my dataset than all your EER results which got less than 1%. This is my questions: Help me, please, understand whats going on? Is there an error in research paper (remind that chinese results looks right)? Is your public checkpoint same that you use in your research?

Thx!

wanghuii1 commented 6 days ago
  1. What is your EER results on VoxCeleb-O/E/H, and was the model that got these results trained on VoxCeleb?
  2. The checkpoint used in the paper is https://modelscope.cn/models/iic/speech_campplus_sv_en_voxceleb_16k
K1ndWha1e commented 6 days ago
I've got next results: VoxCeleb-O VoxCeleb-H VoxCeleb-E
1.99% 3.67% 1.99%

Pics

  1. VoxCeleb-O VoxCeleb-O

  2. VoxCeleb-H VoxCeleb-H

  3. VoxCeleb-E VoxCeleb-E


It's looks more similar to your results... Hm, I'll think whats wrong with my dataset... Last questions, should I teach model for every European language that I want to check? Are MFCC invariant to language change (not english, I mean, but from same language group)?

wanghuii1 commented 6 days ago
  1. Regarding the test results for VoxCeleb-O/E/H, please ensure you have used the appropriate model and test set. You can use the command “python speakerlab/bin/infer_sv.py --model_id $model_id --wavs $wav_path” to extract embeddings,with model_id="iic/speech_campplus_sv_en_voxceleb_16k".
  2. Actually, It is not recommended using an English model for speaker recognition tasks in other languages, as this will likely lead to a decrease in accuracy. You might consider collecting some data from the same language group and train a new model to get better results.

If you have any other questions, please feel free to ask me.

K1ndWha1e commented 4 days ago

Okey, thx so much! I want to check on Common Voice dataset. But, It's seems to huge for unpacking) I'll share my results at this weekend.