ttsunion / Deep-Expression

An Attention Based Open-Source End to End Speech Synthesis Framework, No CNN, No RNN, No MFCC!!!
86 stars 27 forks source link

Maybe you need to feed some better audio data set for better results #1

Open begeekmyfriend opened 6 years ago

begeekmyfriend commented 6 years ago

I have heard the result sample based on only 4K steps training model. Unfortunately there are still some noises. I have not looked into the framework closely. But here are some suggestions from an open tacotron project about training data set. I think maybe you need to fetch better training samples such as THCHS30 and CVTE for this brilliant framework.

FonzieTree commented 6 years ago

Yep. As you could see in tacotron https://github.com/Kyubyong/tacotron wrote by Kyubyong, "Yuxuan, the first author of the tacotron, advised me to do sanity-check first with small data, and to adjust hyperparemters since our dataset is different from his. I really appreciate his tips, and hope this would help you." Because of his tips, I decided to train a good model with small data. After I getting a noise free model, I will test my data on some standard speech dataset like LJ Speech Dataset (https://keithito.com/LJ-Speech-Dataset/) that I already downloaded.

begeekmyfriend commented 6 years ago

Chinese is more popular!

FonzieTree commented 6 years ago

Can't agree any more!

PetrochukM commented 6 years ago

Have you got a chance to train on the "LJ Speech Dataset"? Any samples?