Open prateekgupta891 opened 4 years ago
It has to do with "teacher forcing" or something. Basically eval is outputting what it is training on while using teacher forcing. The demo server doesn't use teacher forcing. So yes, there is a difference in how they will sound. Eval will sound better. I'm not really an expert so that's all I know.
I have tried training on Emotion Dataset which have multiple emotions and same text. So while training at every checkpoint it generates an audio file using some text (dont know how it samples the text from the training data), and the audio sounds good too. But if take that model file and give it as a checkpoint to the demo_server.py code to generate the audio for the same text it do a terrible job. I have already trained it for 200K iterations, but still not able to generate anything except a muffled voice and some noise using the demo_server code. Is there is a difference between what the eval and the demo_server code? Please help!!