wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Apache License 2.0
599 stars 104 forks source link

Question on Fine-Tuning Pre-Trained Models #321

Closed axuan731 closed 1 month ago

axuan731 commented 2 months ago

Hello,

Thank you for your contributions to the wespeaker project. I am trying to fine-tune the pre-trained model on another dataset. I added 'model_init' in the 'conf' directory as my initial model, but my fine-tuning results are not very satisfactory.

Should I retrain the model for 100 epochs for fine-tuning, or should I train it for 5 epochs like the LMF strategy? Are there any specific parameters I need to pay attention to during my training process?

I noticed that the accuracy in my experiment for each epoch during fine-tuning is 0% (although the final test results are slightly better than without fine-tuning).

Thank you in advance for your help!

JiJiJiang commented 1 month ago

During fine-tuning, since new dataset is used, the total num_spk changes. The final output projection layer must be trained from scratch. So the accuracy should start from 0 and slowly increase.

You can try combining the new dataset with the original training dataset. Keep margin=0.2 and use a small lr (i.e., lr=5.0e-05) and train a few epochs (i.e., 10).