change the feature extraction setting of vctk config

kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

https://kan-bayashi.github.io/ParallelWaveGAN/

MIT License

1.54k stars 339 forks source link

change the feature extraction setting of vctk config #433

Open bondio77 opened 3 weeks ago

bondio77 commented 3 weeks ago

Hi, first of all, thank you so much for providing pre-trained models through many experiments. But what I want to ask is, I want to fine-tune the pre-trained VCTK model with my multi-speaker dataset. In the VCTK config file, fft_size = 2048, hop_length = 300, win_length = 1024, but the config of the TTS model I trained is 1024, 256, 1024. When fine-tuning, will it work if I change the config file to 1024, 256, 1024 to match my TTS model? The sampling rate is 24000. Thank you!

kan-bayashi commented 2 weeks ago

Sorry for the late reply. I think you should train from scratch for the following reasons:

The difference of fft size or window size might be OK if you finetune the model
The difference of hop length is critical since the hop length determines the upsampling layer structure.
If you want to change hop length, you need to change the upsample layers as well.

Example: hop length = 300 -> 5 5 4 * 3 https://github.com/kan-bayashi/ParallelWaveGAN/blob/86740373ec609cb9fb192d472d2aea125041491a/egs/vctk/voc1/conf/hifigan.v1.yaml#L40

bondio77 commented 1 week ago

thank you so much for answering me i will try your recommend. thank you!