Closed Yeongtae closed 5 years ago
Interesting, it obviously learns but babbling. I've tried a couple different data sets meanwhile (although always starting with the LJ base model) and this never happened, just the quality gradually from noisy to rather clear.
So I suspect there is some misalignment between the spectrograms and the raw data you feed in.
Did you use GTA output of Tacotron or extracted the features from the wavs yourself?
EDIT: Here the progress of my original training on the LJ dataset: https://www.dropbox.com/sh/2gtunx8d1r92fqb/AADh9CJEtvHnQ7YlwNClk8X5a?dl=0
I've used GTA output from My own repository and model. https://github.com/Yeongtae/Tacotron-2
Now, I saw some difference in hparam.py. It's yours.
It's mine.
In my guess, the difference makes some error.
Ah yes, this tripped me up at first as well, because at some point Rayhane-mamah changed the default sampling rate in https://github.com/Rayhane-mamah/Tacotron-2/blob/master/hparams.py
I'm testing it on v100 gcp machine using my pretrained tacotron checkpoint. But it fail to generate intelligible voice.
Is there any idea about it?
https://www.dropbox.com/s/pl2hkiot6vtlljz/WaveRNN%20test.zip?dl=0 816k.zip Here is the test result.