Integration Tacotron - Githubissues

twidddj / tf-wavenet_vocoder

Wavenet and its applications with Tensorflow

MIT License

56 stars 16 forks source link

Integration Tacotron #2

Open twidddj opened 6 years ago

twidddj commented 6 years ago

So far, I couldn't find the model which attention works in "reduction factor" = 1. If we use the factor > 1, the prediction would seem like below image. It would be a bad news to wavenet performance. teacher_forced_mel_prediction

Here, the original Mel-spectrum is true_mel

You can find some discussion for this issue on @Rayhane-mamah's repo and @Keithito's repo also.

Rayhane-mamah commented 6 years ago

Hi @twidddj, thanks for sharing your work!

I am assuming you trained the wavenet vocoder on ground truth mels? Did you try training it on ground truth aligned samples generated with the Tacotron ( r > 1 ) model? In best cases the wavenet will learn to map mels correctly despite the noise in them. If it doesn't work, we'll try figuring out why attention isn't working with r=1. (Have not tested it yet, I get my gpu this week so I'll tell you how it goes)

twidddj commented 6 years ago

Hi @Rayhane-mamah, welcome!

Yes, you are right. It has trained on ground truth mels not GTA. I have not tested it yet and have a plan to do it using @Keithito's pretrained model(r=5) in next week. If you tell me how it goes on, it must be very helpful to me. We would get an achievement while we are at it.

Rayhane-mamah commented 6 years ago

Yes I am counting on fully training my tacotron an train a wavenet on its GTA output in the upcoming week, I'll tell you how it goes.

On Tue, 10 Apr 2018, 13:08 twidddj, notifications@github.com wrote:

Hi @Rayhane-mamah https://github.com/Rayhane-mamah, welcome!

Yes, you are right. It has trained on ground truth mels not GTA. I have not tested it yet and have a plan to do it using @keithito https://github.com/keithito's pretrained model(r=5) in next week. If you tell me how it goes on, it must be very helpful to me. We would get an achievement while we are at it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/twidddj/wavenet/issues/2#issuecomment-380075432, or mute the thread https://github.com/notifications/unsubscribe-auth/AhFSwCYgXQMmOIUjkj938t8rPlOt0VcYks5tnKC0gaJpZM4TN8ZY .

twidddj commented 6 years ago

We have tried some works for this issue.

Tested Rayhane-mamah's Tacotron-2 with r=1. It's attention works and the intelligibility of TTS remarkably improved compared to previous version. However there is another issue reported by him. We believe the problem would be solved soon. Thanks!
Tested our vocoder on the mel spectrograms computed through the same method as Tacotron2 paper( 2048 fft_size, 300 hop_size, 1300 window_size on 24K sample rate). Although it seems to require more training steps(over 1000K) than r9y9's setting, it works too. Thanks to @Ondal90!