syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
368 stars 110 forks source link

the model is hard to converge with LJSpeech #18

Open zyj008 opened 6 years ago

zyj008 commented 6 years ago

Hi! Thanks for your contribution! I have trained the model on LJSpeech dataset with your codes. But I found the loss is not converge with your default hparams. Here are some results on tensorboard. Could you give me some advice?

  1. batch_size=32 lr=0.002 image

  2. batch_size=32 lr=0.001 image

  3. batch_size=64 lr=0.001 image

  4. batch_size =64 lr=0.0006 image

  5. batch_size=32 lr=0.0001 image

  6. batch_size=32 lr=0.00002 image

Finally, the model seems converge. But the alignment is not good. The step-51000-align.png is like this. Should I keep on training or kill this process and try other hparams? Can you give me some advice? image

sergiodaia commented 6 years ago

if it can help, i used the default hparams with ljspeech 1.1 and the training started to produce intelligible sound on 18k step

step-18000-align

I'd recommend to restart training but i'm not sure, since it seems there is some randomization that might shift convergence