keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
MIT License
2.94k stars 965 forks source link

Only noise in checkpoint audio. #354

Open marshoepial opened 3 years ago

marshoepial commented 3 years ago

I'm trying to use my own data for training. I've tested with the LJSpeech dataset, which even after a few thousand steps produces speech-like audio. Yet, training on my dataset (16000 Hz), it comes out as plain noise after even 40,000 steps. I'm assuming this is because of the audio hparams settings, where I changed the sample rate from 20000 to 16000, but I'm not sure what to change them to. For a 20000 hz audio, the length of frames are much shorter than the default setting, and I'm not sure what the frame shift is used for either. Is this something you tune by hand or is there a way to calculate these values? Thanks.

berkaycinci commented 3 years ago

I am facing the same problem. Did you find any solution? @DashEightMate

saharsyed commented 3 years ago

audio length =max_iters outputs_per_step frame_shift_ms

saharsyed commented 3 years ago

40,000 seems to be quite less i assume