Audio generated using eval is not same as generated by demo_server for the same checkpoint

keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

MIT License

2.94k stars 965 forks source link

I have tried training on Emotion Dataset which have multiple emotions and same text. So while training at every checkpoint it generates an audio file using some text (dont know how it samples the text from the training data), and the audio sounds good too. But if take that model file and give it as a checkpoint to the demo_server.py code to generate the audio for the same text it do a terrible job. I have already trained it for 200K iterations, but still not able to generate anything except a muffled voice and some noise using the demo_server code. Is there is a difference between what the eval and the demo_server code? Please help!!

keithito / tacotron

Audio generated using eval is not same as generated by demo_server for the same checkpoint #314