kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
https://kan-bayashi.github.io/ParallelWaveGAN/
MIT License
1.54k stars 339 forks source link

change the feature extraction setting of vctk config #433

Open bondio77 opened 3 weeks ago

bondio77 commented 3 weeks ago

Hi, first of all, thank you so much for providing pre-trained models through many experiments. But what I want to ask is, I want to fine-tune the pre-trained VCTK model with my multi-speaker dataset. In the VCTK config file, fft_size = 2048, hop_length = 300, win_length = 1024, but the config of the TTS model I trained is 1024, 256, 1024. When fine-tuning, will it work if I change the config file to 1024, 256, 1024 to match my TTS model? The sampling rate is 24000. Thank you!

kan-bayashi commented 2 weeks ago

Sorry for the late reply. I think you should train from scratch for the following reasons:

Example: hop length = 300 -> 5 5 4 * 3 https://github.com/kan-bayashi/ParallelWaveGAN/blob/86740373ec609cb9fb192d472d2aea125041491a/egs/vctk/voc1/conf/hifigan.v1.yaml#L40

bondio77 commented 1 week ago

thank you so much for answering me i will try your recommend. thank you!