Open twidddj opened 6 years ago
Hi @twidddj, thanks for sharing your work!
I am assuming you trained the wavenet vocoder on ground truth mels? Did you try training it on ground truth aligned samples generated with the Tacotron ( r > 1 ) model? In best cases the wavenet will learn to map mels correctly despite the noise in them. If it doesn't work, we'll try figuring out why attention isn't working with r=1. (Have not tested it yet, I get my gpu this week so I'll tell you how it goes)
Hi @Rayhane-mamah, welcome!
Yes, you are right. It has trained on ground truth mels not GTA. I have not tested it yet and have a plan to do it using @Keithito's pretrained model(r=5) in next week. If you tell me how it goes on, it must be very helpful to me. We would get an achievement while we are at it.
Yes I am counting on fully training my tacotron an train a wavenet on its GTA output in the upcoming week, I'll tell you how it goes.
On Tue, 10 Apr 2018, 13:08 twidddj, notifications@github.com wrote:
Hi @Rayhane-mamah https://github.com/Rayhane-mamah, welcome!
Yes, you are right. It has trained on ground truth mels not GTA. I have not tested it yet and have a plan to do it using @keithito https://github.com/keithito's pretrained model(r=5) in next week. If you tell me how it goes on, it must be very helpful to me. We would get an achievement while we are at it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/twidddj/wavenet/issues/2#issuecomment-380075432, or mute the thread https://github.com/notifications/unsubscribe-auth/AhFSwCYgXQMmOIUjkj938t8rPlOt0VcYks5tnKC0gaJpZM4TN8ZY .
We have tried some works for this issue.
So far, I couldn't find the model which attention works in "reduction factor" = 1. If we use the factor > 1, the prediction would seem like below image. It would be a bad news to wavenet performance.
Here, the original Mel-spectrum is
You can find some discussion for this issue on @Rayhane-mamah's repo and @Keithito's repo also.