High eval mel loss when training on Mandarin datasets

Charlottecuc commented 3 years ago

Hi. Thank you for your implementation. I trained the model on some Mandarin datasets (12000/train_set & 100/eval_set) for about 695k steps. The train/mel loss is about 0.12 and train/dur_loss is about 0.0158. The eval/dur_loss is about 0.07. However, the eval/mel loss is high (~0.84). Besides, I also notice that the model sometimes fails to synthesize reduplicated words (e.g. 嗯嗯、叽叽喳喳）and tone-5 words（轻读 e.g. 哎呀）

If you don't mind, could you tell me how did you process the DataBaker datasets, what does your input text look like, and how to solve the problems mentioned above? Thank you very much for your help

liusongxiang commented 3 years ago

Thanks for your attention. For "The train/mel loss is about 0.12 and train/dur_loss is about 0.0158. The eval/dur_loss is about 0.07. However, the eval/mel loss is high (~0.84).":

The losses in my experiment with DataBaker have the same scale as yours. I also found the eval mel-loss increases as the training steps grow. This is a sign of overfitting. However, I tried using dropout, but got worse sysnthesis results. The mandarin Chinese samples were generated from models without dropout. Maybe there are some tricks missing in the paper, but I have no time to carefully do model tuning.

For "Besides, I also notice that the model sometimes fails to synthesize reduplicated words (e.g. 嗯嗯、叽叽喳喳）and tone-5 words（轻读 e.g. 哎呀":

I haven't conducted bad cases analysis yet since I do not have a mandarin Chinese frontend consistent with the DataBaker dataset.

For "If you don't mind, could you tell me how did you process the DataBaker datasets, what does your input text look like, and how to solve the problems mentioned above? Thank you very much for your help":

I did very naiive text preprocessing: 我直接用的数据集里面自带的标注，每个拼音被拆分成声母和韵母，tone在韵母上（作为整体）。每句话首尾各加上了"\<sil>"。就这些。

Hope this can help.

Charlottecuc commented 3 years ago

Thank you for your reply~

liusongxiang / efficient_tts

High eval mel loss when training on Mandarin datasets #7