Closed MrYANG23 closed 2 years ago
Hi @MrYANG23 , thanks for your attention. That's because the speaker number should be matched with the pre-trained model. You may change the speaker number for fine-tuning on it following here.
thank you ,you have done so many nice works ,how much number sentences in per-speaker may be get good result when finetune on small data(3 speakers),I test about per-speaker 50 sentences ,in training the result is ok ,but in val the result is not well. trian log like this: val log like this:
@MrYANG23 sorry for TOO late response.. I think 150 sentences in total is not enough with the base configuration. You can reduce the model size and add some regularization term in loss functions or apply weight decay or dropout.
Close due to inactivity.
HI ,i have a issue about finetune ,when FastSpeech2 model trained on big data(ASHELL3), so,when i finetune the pretrained model on samll data(maybe 3 speakers), if I Just change the number speaker in nn.embedding ,and load the pretrained ckpt except the nn.embedding part,like this