Maybe you need to feed some better audio data set for better results

begeekmyfriend commented 6 years ago

I have heard the result sample based on only 4K steps training model. Unfortunately there are still some noises. I have not looked into the framework closely. But here are some suggestions from an open tacotron project about training data set. I think maybe you need to fetch better training samples such as THCHS30 and CVTE for this brilliant framework.

FonzieTree commented 6 years ago

Yep. As you could see in tacotron https://github.com/Kyubyong/tacotron wrote by Kyubyong, "Yuxuan, the first author of the tacotron, advised me to do sanity-check first with small data, and to adjust hyperparemters since our dataset is different from his. I really appreciate his tips, and hope this would help you." Because of his tips, I decided to train a good model with small data. After I getting a noise free model, I will test my data on some standard speech dataset like LJ Speech Dataset (https://keithito.com/LJ-Speech-Dataset/) that I already downloaded.

begeekmyfriend commented 6 years ago

Chinese is more popular!

FonzieTree commented 6 years ago

Can't agree any more!

PetrochukM commented 6 years ago

Have you got a chance to train on the "LJ Speech Dataset"? Any samples?

ttsunion / Deep-Expression

Maybe you need to feed some better audio data set for better results #1