Fail to converge wavernn on LJ-speech dataset.

m-toman / tacorn

2018/2019 TTS framework integrating state of the art open source methods

MIT License

47 stars 4 forks source link

Fail to converge wavernn on LJ-speech dataset. #5

Closed Yeongtae closed 5 years ago

Yeongtae commented 5 years ago

I'm testing it on v100 gcp machine using my pretrained tacotron checkpoint. But it fail to generate intelligible voice.

Is there any idea about it?

https://www.dropbox.com/s/pl2hkiot6vtlljz/WaveRNN%20test.zip?dl=0 816k.zip Here is the test result.

m-toman commented 5 years ago

Interesting, it obviously learns but babbling. I've tried a couple different data sets meanwhile (although always starting with the LJ base model) and this never happened, just the quality gradually from noisy to rather clear.

So I suspect there is some misalignment between the spectrograms and the raw data you feed in.

Did you use GTA output of Tacotron or extracted the features from the wavs yourself?

EDIT: Here the progress of my original training on the LJ dataset: https://www.dropbox.com/sh/2gtunx8d1r92fqb/AADh9CJEtvHnQ7YlwNClk8X5a?dl=0

Yeongtae commented 5 years ago

I've used GTA output from My own repository and model. https://github.com/Yeongtae/Tacotron-2

Now, I saw some difference in hparam.py. It's yours.

It's mine.

In my guess, the difference makes some error.

m-toman commented 5 years ago

Ah yes, this tripped me up at first as well, because at some point Rayhane-mamah changed the default sampling rate in https://github.com/Rayhane-mamah/Tacotron-2/blob/master/hparams.py