[example] add recipe cnceleb/v3_finetune for finetuning after SSL

wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Apache License 2.0

688 stars 116 forks source link

[example] add recipe cnceleb/v3_finetune for finetuning after SSL #290

Closed P1ping closed 6 months ago

P1ping commented 6 months ago

add the recipe cnceleb/v3_finetune for finetuning after self-supervised learning (v3), currently supporting DINO-based ECAPA-TDNN:

run.sh is copied from v2 and simply adapted for initialization with the pretrained model, with default arguments for DINO-based ECAPA-TDNN fine-tuning.
local, path.sh, wespeaker, tools are copied from cnceleb/v2.
conf contains the finetuning (50 epochs) and the LM-finetuning (same as v2) configurations.
README.md involves the finetuning results of DINO models pretrained on CNCeleb and WenetSpeech (filtered), while also providing the reference to the papers and link to the pretrained models.

czy97 commented 6 months ago

@P1ping Hello, thanks for your contribution!! Can you link the directory or file if you reuse the files from some other recipes under Wespeaker. It is best not to copy directly.

Hunterhuan commented 6 months ago

@P1ping Hello，

There are some fusions in your results table. Was your baseline result pretrained on cnceleb?

If it is directly trained, then the header should be divided into pretraining data and training data.

P1ping commented 6 months ago

@Hunterhuan Hi! Yes, the baseline was pretrained on CNCeleb for 75 epochs, and then finetuned for 50 epochs. Both the two blocks of results are based on 75-epoch pre-training, one on CNCeleb and the other on WenetSpeech. The fine-tuning data is consistently CNCeleb with labels.

Given that the results of direct training are detailed in v2/README.md, they were duplicated here.