Closed mrgloom closed 5 years ago
Please see https://github.com/r9y9/deepvoice3_pytorch/tree/master/vctk_preprocess and prepare phone alignments before preprocessing. I remember that silence parts affected performance.
Is alignment needed only at train time?
Only at preprocessing time.
I mean when model is already trained, input text to TTS is still need to be preprocessed in some way(i.e. lab files obtained)?
Needed only at data preparation time. You don't need lab files at inference time.
@mrgloom would you please provide your 500 000 step model here ?
I obtained bad results on vctk dataset after 500000 steps, here is a sample: example.zip
My dataset preparation:
time python preprocess.py --preset=presets/deepvoice3_vctk.json vctk /data_large/tts-datasets/VCTK-Corpus /data_large/tts-datasets/VCTK-Corpus-prepared
But as I can see here there can be an option of using *.lab files? https://github.com/r9y9/deepvoice3_pytorch/blob/master/vctk.py#L61
How
20171222_deepvoice3_vctk108_checkpoint_step000300000.pth
was trained?