Open wanshun123 opened 5 years ago
@wanshun123 Hi, I cannot open the data link to check the quality of data. I tried different data sets before and found it works.
Besides, the attention used in this repo is a very basic one, which is not so good to generate long sentences.
@wanshun123 Did you train using use_gst=False
? I have the same issue when use_gst=False
but not when True
.
@syang1993 In my case the audio seems intelligible, although not good quality. I am using the Emotional Speech Dataset from https://hltsingapore.github.io/ESD/download.html
The English data shows similar attention "collapse". The Chinese data is ok.
Curious if others have achieved reasonable results training on custom data. I've tried training the model on data from https://github.com/aomv/voiceloop-in-the-wild-experiments/tree/master/data/donald-trump/data (which has audio files and transcriptions of a few seconds in length, for somewhere around a couple hours in total) making a metadata.csv file in the same format as the LJSpeech dataset.
While I've trained for several hours with a steadily decreasing loss, the graph would indicate the model is not learning properly. I've also failed to generate intelligible audio at least without using a reference audio (trying several times).