mkotha / WaveRNN

A WaveRNN implementation
MIT License
198 stars 48 forks source link

I used it to successfully train a Chinese Mandarin model.But something went wrong... #10

Closed zhjy8827 closed 5 years ago

zhjy8827 commented 5 years ago

Hello, thank you very much for the code you provided. I used it to successfully train a Chinese Mandarin model. The training process is all right. image image But something went wrong... When I synthesized the audio, I found that he had a chance to have an empty audio(Probability is about 40%), and it had no content: When the audio is empty, the batch size is always equal to 1 image Can you give some advice?Thank you!

hparams.py.zip

MlWoo commented 5 years ago

I think there is a bug in the code. although there is mask to make that only the low half part of the r_t has connection with the current coarse input, and r_t will multiply the h_t-1 point-wisely and then matrix-multiply the R_e. so the current coarse input will conduct its influence on e_t not only low half part but also high half part by matrix-multiply operation. In other words, the code will infer the current coarse output with the current coarse input. So it will fails when synthesising.

hdmjdp commented 4 years ago

I think there is a bug in the code. although there is mask to make that only the low half part of the r_t has connection with the current coarse input, and r_t will multiply the h_t-1 point-wisely and then matrix-multiply the R_e. so the current coarse input will conduct its influence on e_t not only low half part but also high half part by matrix-multiply operation. In other words, the code will infer the current coarse output with the current coarse input. So it will fails when synthesising.

I think this is not right. I think there is no imformation about coarse in r_t. Because you have filter it using mask. When r_t is transmited to next, e_t do not have coarse information in the high part.