Closed FredZZD closed 2 years ago
Sorry for the late reply. Sanyuan will answer your question soon @FredMushZhao
Hi @FredMushZhao , Here are some details that we fine-tune WavLM-Large on Voxceleb2:
Stage 1:
Stage 2:
Stage 3:
Note that we find the pre-trained models are prone to overfit on the ASV training data, and more training steps would lead to significant performance degradation.
Hi!@FredMushZhao,Did you reach the performance described in the paper?
Hi!@FredMushZhao,Did you reach the performance described in the paper?
Not yet, we tried a few times, and the best result on vox1o-clean is eer=1.7 after stage1 and eer=1.2 after stage2 Still working on it...
Not yet, we tried a few times, and the best result on vox1o-clean is eer=1.7 after stage1 and eer=1.2 after stage2 Still working on it...
our best result on vox1o-clean is eer=0.7% after stage1,but the epoch_num=60;furthermore,it's hard to improve in stage2
Hi @FredZZD , could you please share the code finetuning wavlm on speaker verification task? I'm new in this work. Thanks a lot
Hey (: I'd like to join @dntuong 's request - a code for fine-tuning WavLM + ECAPA-TDNN for speaker verification would be excellent
Hi @DoriRimon @dntuong, did you get the code for fine-tuning the WavLM + ECAPA-TDNN? could you please share the code?
Interested in the fine-tuning code as well!
Hey @DoriRimon , @dntuong , @arvindmn01 , were you guys able to get the code? I'd be really interested as well!
I tried to reproduce your work on vox1-o, but cannot reach the performance described in the paper, here is my implementation wavlm-large from huggingface/microsoft/wavlm-large ecapa-tdnn-base from speechbrain aam-softmax with m=0.2, s=30 constant lr=5e-5 weight-decay=0 intertop-k=5 batch=512 stage1 freeze wavlm, train ecapa 20 epochs, about 45k steps, chunk_size=3s got eer=1.6 stage2 unfreeze wavlm, train both 5 epochs about 12k steps, chunk_size=3s got eer=0.82 (0.61 in paper) Is there any difference with yours? Thanks