[Question] How to train hifi gan with custom dataset(espnet2 tts recipe)?

kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

https://kan-bayashi.github.io/ParallelWaveGAN/

MIT License

1.57k stars 343 forks source link

[Question] How to train hifi gan with custom dataset(espnet2 tts recipe)? #311

Closed seastar105 closed 2 years ago

seastar105 commented 2 years ago

Sorry for dumb question, in egs/README.md it seems pwg can be trained with espnet2 tts recipe and also hifigan too.

prepare espnet2 tts dataset -> clone this repo -> modify some dataset's hifigan.v1.config and copy it -> make sym link -> run.sh --stage 1 --conf conf/hifigan_v1.config

is it right step to train hifigan from scratch with custom dataset? and which things are essential to modify in config?

it seems it's necessary to synchronize text2mel's feature extraction setting and vocoder's feature extraction setting.

kan-bayashi commented 2 years ago

It seems fine. The hyperparameters what you need to check are https://github.com/kan-bayashi/ParallelWaveGAN/blob/6d4411b65f9487de5ec49dabf029dc107f23192d/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L13-L21

If you change the hop_size, please change batch_max_steps. https://github.com/kan-bayashi/ParallelWaveGAN/blob/6d4411b65f9487de5ec49dabf029dc107f23192d/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L127

Also, mel loss setting should be checked. You do not need to use the same values as the above but at least need to use correct sampling rate. https://github.com/kan-bayashi/ParallelWaveGAN/blob/6d4411b65f9487de5ec49dabf029dc107f23192d/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L97-L104

seastar105 commented 2 years ago

Thank you. @kan-bayashi. One more question. there's information about how it takes to train in some of conf/*.yaml, (e.g parallel_wavegan.v1.yaml). How much time takes to train hifigan 2.5M steps? and when does hifigan start to make reasonable sound?

kan-bayashi commented 2 years ago

How much time takes to train hifigan 2.5M steps?

In the case of V100 x 1, it takes around 2 weeks.

and when does hifigan start to make reasonable sound?

It should produce reasonable around 200k iters. The following issue will help you https://github.com/kan-bayashi/ParallelWaveGAN/issues/278