keonlee9420 / Expressive-FastSpeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Other
267 stars 46 forks source link

finetune on small data #9

Closed MrYANG23 closed 2 years ago

MrYANG23 commented 2 years ago

HI ,i have a issue about finetune ,when FastSpeech2 model trained on big data(ASHELL3), so,when i finetune the pretrained model on samll data(maybe 3 speakers), if I Just change the number speaker in nn.embedding ,and load the pretrained ckpt except the nn.embedding part,like this image

keonlee9420 commented 2 years ago

Hi @MrYANG23 , thanks for your attention. That's because the speaker number should be matched with the pre-trained model. You may change the speaker number for fine-tuning on it following here.

MrYANG23 commented 2 years ago

thank you ,you have done so many nice works ,how much number sentences in per-speaker may be get good result when finetune on small data(3 speakers),I test about per-speaker 50 sentences ,in training the result is ok ,but in val the result is not well. trian log like this: image val log like this: image

keonlee9420 commented 2 years ago

@MrYANG23 sorry for TOO late response.. I think 150 sentences in total is not enough with the base configuration. You can reduce the model size and add some regularization term in loss functions or apply weight decay or dropout.

keonlee9420 commented 2 years ago

Close due to inactivity.