microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.64k stars 2.51k forks source link

finetune wavlm-large on speaker verification task #695

Closed FredZZD closed 2 years ago

FredZZD commented 2 years ago

I tried to reproduce your work on vox1-o, but cannot reach the performance described in the paper, here is my implementation wavlm-large from huggingface/microsoft/wavlm-large ecapa-tdnn-base from speechbrain aam-softmax with m=0.2, s=30 constant lr=5e-5 weight-decay=0 intertop-k=5 batch=512 stage1 freeze wavlm, train ecapa 20 epochs, about 45k steps, chunk_size=3s got eer=1.6 stage2 unfreeze wavlm, train both 5 epochs about 12k steps, chunk_size=3s got eer=0.82 (0.61 in paper) Is there any difference with yours? Thanks

MarkWuNLP commented 2 years ago

Sorry for the late reply. Sanyuan will answer your question soon @FredMushZhao

Sanyuan-Chen commented 2 years ago

Hi @FredMushZhao , Here are some details that we fine-tune WavLM-Large on Voxceleb2:

Note that we find the pre-trained models are prone to overfit on the ASV training data, and more training steps would lead to significant performance degradation.

WindAIer commented 2 years ago

Hi!@FredMushZhao,Did you reach the performance described in the paper?

FredZZD commented 2 years ago

Hi!@FredMushZhao,Did you reach the performance described in the paper?

Not yet, we tried a few times, and the best result on vox1o-clean is eer=1.7 after stage1 and eer=1.2 after stage2 Still working on it...

WindAIer commented 2 years ago

Not yet, we tried a few times, and the best result on vox1o-clean is eer=1.7 after stage1 and eer=1.2 after stage2 Still working on it...

our best result on vox1o-clean is eer=0.7% after stage1,but the epoch_num=60;furthermore,it's hard to improve in stage2

dntuong commented 1 year ago

Hi @FredZZD , could you please share the code finetuning wavlm on speaker verification task? I'm new in this work. Thanks a lot

DoriRimon commented 1 year ago

Hey (: I'd like to join @dntuong 's request - a code for fine-tuning WavLM + ECAPA-TDNN for speaker verification would be excellent

arvindmn01 commented 5 months ago

Hi @DoriRimon @dntuong, did you get the code for fine-tuning the WavLM + ECAPA-TDNN? could you please share the code?

berkcoker commented 1 month ago

Interested in the fine-tuning code as well!