liusongxiang / efficient_tts

Pytorch implementation of "Efficienttts: an efficient and high-quality text-to-speech architecture"
MIT License
115 stars 21 forks source link

High eval mel loss when training on Mandarin datasets #7

Closed Charlottecuc closed 3 years ago

Charlottecuc commented 3 years ago

Hi. Thank you for your implementation. I trained the model on some Mandarin datasets (12000/train_set & 100/eval_set) for about 695k steps. The train/mel loss is about 0.12 and train/dur_loss is about 0.0158. The eval/dur_loss is about 0.07. However, the eval/mel loss is high (~0.84). Besides, I also notice that the model sometimes fails to synthesize reduplicated words (e.g. 嗯嗯、叽叽喳喳)and tone-5 words(轻读 e.g. 哎呀)

If you don't mind, could you tell me how did you process the DataBaker datasets, what does your input text look like, and how to solve the problems mentioned above? Thank you very much for your help

liusongxiang commented 3 years ago

Thanks for your attention. For "The train/mel loss is about 0.12 and train/dur_loss is about 0.0158. The eval/dur_loss is about 0.07. However, the eval/mel loss is high (~0.84).":

For "Besides, I also notice that the model sometimes fails to synthesize reduplicated words (e.g. 嗯嗯、叽叽喳喳)and tone-5 words(轻读 e.g. 哎呀":

For "If you don't mind, could you tell me how did you process the DataBaker datasets, what does your input text look like, and how to solve the problems mentioned above? Thank you very much for your help":

Hope this can help.

Charlottecuc commented 3 years ago

Thank you for your reply~