wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Apache License 2.0
599 stars 104 forks source link

Could you please provide the pt file and oonx file of the pre-trained ECAPA-TDNN model? #284

Closed Zhubisong closed 4 months ago

Zhubisong commented 4 months ago

I can't find the corresponding file in this repository, I want to use your model as the baseline model in the paper, can you provide the corresponding file?

JiJiJiang commented 4 months ago

Thank you for your comment. We will upload the ECAPA-TDNN models trained on voxceleb dataset.

Zhubisong commented 4 months ago

Thank you for uploading. However, I have some questions/advice.

  1. I do not know what happen, but when I use voxceleb_ECAPA_512LM extract embeddings for VoxConverse, the dimension of embedding is 192, no 512.
  2. I did some experiments on Dihard1 and AMI SDM, and I found that ResNet34 without LM performed better than ResNet34LM (ADAPTIVE LARGE MARGIN FINE-TUNING FOR ROBUST SPEAKER VERIFICATION may account for this, LM is useful for long audio, while the segments used in Speaker Diarization are short), while on VoxConverse LM is better. I want to know would you please upload ECAPA without LM?
JiJiJiang commented 4 months ago
  1. 512 means the channels dimension, not the embedding size, which is 192 as you mention.
  2. The models with or without LM perform differently on the diarization task on different datasets. Although I guess the DER difference should be very marginal, I am interested in your conclusion. The models without LM would be uploaded later.
JiJiJiang commented 4 months ago

done

wsstriving commented 4 months ago

Finished, check https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md