r9y9 / deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
https://r9y9.github.io/deepvoice3_pytorch/
Other
1.97k stars 485 forks source link

Bad results on vctk #155

Closed mrgloom closed 5 years ago

mrgloom commented 5 years ago

I obtained bad results on vctk dataset after 500000 steps, here is a sample: example.zip

My dataset preparation: time python preprocess.py --preset=presets/deepvoice3_vctk.json vctk /data_large/tts-datasets/VCTK-Corpus /data_large/tts-datasets/VCTK-Corpus-prepared

But as I can see here there can be an option of using *.lab files? https://github.com/r9y9/deepvoice3_pytorch/blob/master/vctk.py#L61

How 20171222_deepvoice3_vctk108_checkpoint_step000300000.pth was trained?

r9y9 commented 5 years ago

Please see https://github.com/r9y9/deepvoice3_pytorch/tree/master/vctk_preprocess and prepare phone alignments before preprocessing. I remember that silence parts affected performance.

mrgloom commented 5 years ago

Is alignment needed only at train time?

r9y9 commented 5 years ago

Only at preprocessing time.

mrgloom commented 5 years ago

I mean when model is already trained, input text to TTS is still need to be preprocessed in some way(i.e. lab files obtained)?

r9y9 commented 5 years ago

Needed only at data preparation time. You don't need lab files at inference time.

mennatallah644 commented 5 years ago

@mrgloom would you please provide your 500 000 step model here ?