finetune wavlm-large on speaker verification task

FredZZD commented 2 years ago

I tried to reproduce your work on vox1-o, but cannot reach the performance described in the paper, here is my implementation wavlm-large from huggingface/microsoft/wavlm-large ecapa-tdnn-base from speechbrain aam-softmax with m=0.2, s=30 constant lr=5e-5 weight-decay=0 intertop-k=5 batch=512 stage1 freeze wavlm, train ecapa 20 epochs, about 45k steps, chunk_size=3s got eer=1.6 stage2 unfreeze wavlm, train both 5 epochs about 12k steps, chunk_size=3s got eer=0.82 (0.61 in paper) Is there any difference with yours? Thanks

MarkWuNLP commented 2 years ago

Sorry for the late reply. Sanyuan will answer your question soon @FredMushZhao

Sanyuan-Chen commented 2 years ago

Hi @FredMushZhao , Here are some details that we fine-tune WavLM-Large on Voxceleb2:

Stage 1:
- freeze wavlm;
- batch_size=1024;
- epoch_num=20 (about 22k steps);
- AAM-softmax with scale=32, margin=0 for the first 2 epoch, margin=0.2 for the last 18 epochs
- intertop-k=5, intertop-m=0.1;
- chunk_size=3s
- online data augmention using the MUSAN noise, DNS noise and the RIR reverberation with probability=0.6
- SGD optimizer with momentum=0.9, weight_decay=4.0e-4
- exponential decrease learning rate schedule (initial_lr=0.64, final_lr=4e-4) with warmup coefficient for the first 1 epoch, resulting in the learning rate increase from 0 to about 0.45 for the first 1 epoch, then decrease exponentially to 4e-4 for the last 19 epochs.
Stage 2:
- finetune wavlm with learning rate scale=0.01;
- batch_size=512;
- epoch_num=5 (about 11k steps);
- AAM-softmax with scale=32, margin=0.2
- intertop-k=5, intertop-m=0.1;
- chunk_size=3s
- online data augmention using the MUSAN noise, DNS noise and the RIR reverberation with probability=0.6
- SGD optimizer with momentum=0.9, weight_decay=1.0e-4
- exponential decrease learning rate schedule (initial_lr=8e-3, final_lr=4.4e-3) with warmup coefficient for the first 3 epoch, resulting in the learning rate increase from 0 to about 5.5e-3 for the first 3 epoch, then decrease exponentially to 4.4e-3 for the last 2 epochs.
Stage 3:
- finetune wavlm with learning rate scale=0.01;
- batch_size=192;
- epoch_num=2 (about 11k steps);
- AAM-softmax with scale=32, margin=0.4
- chunk_size=6s
- online data augmention using the MUSAN noise, DNS noise and the RIR reverberation with probability=0.6
- SGD optimizer with momentum=0.9, weight_decay=1.0e-4
- exponential decrease learning rate schedule (initial_lr=8e-4, final_lr=2e-4) with warmup coefficient for the first 2 epoch, resulting in the learning rate increase from 0 to about 2.3e-4 for the first 6k steps, then decrease to 2e-4 for the 5k steps.

Note that we find the pre-trained models are prone to overfit on the ASV training data, and more training steps would lead to significant performance degradation.

WindAIer commented 2 years ago

Hi！@FredMushZhao,Did you reach the performance described in the paper?

FredZZD commented 2 years ago

Hi！@FredMushZhao,Did you reach the performance described in the paper?

Not yet, we tried a few times, and the best result on vox1o-clean is eer=1.7 after stage1 and eer=1.2 after stage2 Still working on it...

WindAIer commented 2 years ago

Not yet, we tried a few times, and the best result on vox1o-clean is eer=1.7 after stage1 and eer=1.2 after stage2 Still working on it...

our best result on vox1o-clean is eer=0.7% after stage1,but the epoch_num=60；furthermore，it's hard to improve in stage2

dntuong commented 1 year ago

Hi @FredZZD , could you please share the code finetuning wavlm on speaker verification task? I'm new in this work. Thanks a lot

DoriRimon commented 1 year ago

Hey (: I'd like to join @dntuong 's request - a code for fine-tuning WavLM + ECAPA-TDNN for speaker verification would be excellent

arvindmn01 commented 7 months ago

Hi @DoriRimon @dntuong, did you get the code for fine-tuning the WavLM + ECAPA-TDNN? could you please share the code?

berkcoker commented 3 months ago

Interested in the fine-tuning code as well!

DhwanilV commented 4 days ago

Hey @DoriRimon , @dntuong , @arvindmn01 , were you guys able to get the code? I'd be really interested as well!

microsoft / unilm

finetune wavlm-large on speaker verification task #695