mailong25 / self-supervised-speech-recognition

speech to text with self-supervised learning based on wav2vec 2.0 framework
379 stars 114 forks source link

Finetuning a Vietnamese model #19

Closed NhacBatQuan closed 3 years ago

NhacBatQuan commented 3 years ago

Hi Mailong, We are doing the group project with wav2vec on Colab and we use Colab to train the model using VLSP dataset since Colab is limited, we use your pre-train model for the finetuning our dataset. Our dataset is around 60-70 hours with the 100h config, the process seem fine at the beginning but when the train loss value reach 400, it's valid loss suddenly increase significantly and also the wer, only the uer seem to decrease. Also the train loss reduce very slow, about 1 point per epoch. Can you help us with the problem.

Thank you very much.

mailong25 commented 3 years ago

You should take look at some of the threads related to WER at: https://github.com/pytorch/fairseq/search?q=wer+wav2vec+2.0&type=issues

mailong25 commented 3 years ago

You can take a look at my log file for reference: https://github.com/mailong25/self-supervised-speech-recognition/blob/master/examples/hydra_train_finetune.log