Closed seastar105 closed 2 years ago
It seems fine. The hyperparameters what you need to check are https://github.com/kan-bayashi/ParallelWaveGAN/blob/6d4411b65f9487de5ec49dabf029dc107f23192d/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L13-L21
If you change the hop_size, please change batch_max_steps. https://github.com/kan-bayashi/ParallelWaveGAN/blob/6d4411b65f9487de5ec49dabf029dc107f23192d/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L127
Also, mel loss setting should be checked. You do not need to use the same values as the above but at least need to use correct sampling rate. https://github.com/kan-bayashi/ParallelWaveGAN/blob/6d4411b65f9487de5ec49dabf029dc107f23192d/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L97-L104
Thank you. @kan-bayashi. One more question. there's information about how it takes to train in some of conf/*.yaml, (e.g parallel_wavegan.v1.yaml). How much time takes to train hifigan 2.5M steps? and when does hifigan start to make reasonable sound?
How much time takes to train hifigan 2.5M steps?
In the case of V100 x 1, it takes around 2 weeks.
and when does hifigan start to make reasonable sound?
It should produce reasonable around 200k iters. The following issue will help you https://github.com/kan-bayashi/ParallelWaveGAN/issues/278
Sorry for dumb question, in egs/README.md it seems pwg can be trained with espnet2 tts recipe and also hifigan too.
prepare espnet2 tts dataset -> clone this repo -> modify some dataset's hifigan.v1.config and copy it -> make sym link ->
run.sh --stage 1 --conf conf/hifigan_v1.config
is it right step to train hifigan from scratch with custom dataset? and which things are essential to modify in config?
it seems it's necessary to synchronize text2mel's feature extraction setting and vocoder's feature extraction setting.