Huge qualitative difference between training time synthesize and Custom text synthesize

r9y9 / deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

https://r9y9.github.io/deepvoice3_pytorch/

Other

1.97k stars 485 forks source link

Huge qualitative difference between training time synthesize and Custom text synthesize #188

Closed Arafat4341 closed 4 years ago

Arafat4341 commented 4 years ago

Hello! I am working with jsut. Model generates synthesized audio after every 10,000 steps of checkpoint. But when I try to synthesize my own text with that particular checkpoint, I see a huge qualitative difference. Infact the custom synthesized audio is no where near the the training time generated audio in terms of quality. Why does this happen? And what is that audio that is generated during training time?

JohnHerry commented 3 years ago

me too!