Open Eleanor456 opened 4 years ago
What datasets and presets are you using?
您正在使用哪些数据集和预设?
Chinese datasets with 61 speakers, and the preset I have modified according to the deepvoice3_vctk.json
What frontend selected? I'm trying to train on spanish speakers and the results are a litte gibberish, but not noise.
What frontend selected? I'm trying to train on spanish speakers and the results are a litte gibberish, but not noise.
I convert the transcript to pinyin form, so I selected the en frontend. I think the bad result may be the epochs is not enough.
Shouldn't be so noisy. This is what i get with 40000 steps on 13 speaker dataset.
es frontend, so no phonetics dictionary
Shouldn't be so noisy. This is what i get with 40000 steps on 13 speaker dataset.
es frontend, so no phonetics dictionary
This is the result after training for 61000 steps with batch size of 64.
It is slightly better than before, so I plan to continue training and observe the result.
Please let me know how well it goes with that batch size
The same problem. I am using the MAGICDATA dataset, 1016 speakers, training at 1500,000~2000,000 steps got good result in trainging process. but the inference with these two model got bad speech. @Eleanor456 Is your model good right now?
when I generated the audio by the checkpoint with 32000 steps, the output was pure noise. And the alignment pictures are always empty as following. How can I get the result close normal sound which obtained during training.