Open rishikksh20 opened 6 years ago
Hi, the main part of this model is similar to Tacotron1. We can also add the style embedding part to Tacotron2, then integrate it to wavenet to get better results.
ok, I will modify r9y9/Tacotron-2 code and add your style embedding code in that and then will see hows it's working.
@syang1993 what about loss function ?
@rishikksh20 The style token is trained under an unsupervised way, I guess we don't need extra loss unless you have a specific purpose.
@syang1993 thanks! Is it possible to integrate Tacotron 1 with wavenet vocoder, as the GST Tacotron paper has mentioned that they have tested it on Wavenet, so I think it is possible.
@rishikksh20 I tried to integrate Tacotron1 with wavenet, but the performance is worse than Tacotron2. Though the paper tested it on wavenet, I guess it's easier to do it with tacotron2.
@syang1993 ok got it , issue with Tacotron 1 might be due to receptive field width. Anyways, regarding Tacotron 2 just adding style embedding part of your code enough (though I can check easily, but training a tacotron 2 took at least a week ), because in GST paper they mentioned some changes in decoder part also.
Sorry to ask you so much questions but it kind an urgent task for me and I have limited computation.
@rishikksh20 I'm not sure what the "receptive field width" mean in Tacotron1? I did Tacotron 2 before, I guess it doesn't take so much time to train. But actually if you add style embedding and reference encoder to Tacotron 2, it will take more time. And the decoder part in this repo not perfectly match to the paper, I just try to use the style embedding idea to see how it works. I guess you may don't need to reconstruct the paper's structure all the same, you can modify it with your own purpose.
has anyone tried GST w/ Tacotron 2 and WaveNet? I am working on it now but don't have results yet so this could be all for naught..
@karamarieliu could you share your work with me, I am also working on this issue.
@rishikksh20 I am currently encounter an evaluation error so I'll post it when that is solved. Rn I have T1 with GST and Wavenet if you wanted that. Still testing it but it runs okay. https://github.com/karamarieliu/gst-tacotron-wavenet
@karamarieliu means GST-Tacotron 1 with wavenet_vocoder running fine ? Do you have any voice sample of that? Because I tried to integrate gst-tacotron (based on Tacotron 1) with wavenet vocoder but it hasn't performed well. If you have any voice sample which generates spectrogram using gst-tacotron and synthesizes voice using wavenet_vocoder then please share with me. And also in your mentioned repo, you didn't mentioned how to use wavenet-vocoder with gst-tacotron.
@karamarieliu can you share how to train Wavenet (with gst Tacotron 1) here and how to synthesise audio though I followed the and figure out how to systhesize but it better if you elaborate bit if you have some time, otherwise please share the command to train wavenet if possible.
Is there any way to integrate this code with wavenet vocoder ?