kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
https://kan-bayashi.github.io/ParallelWaveGAN/
MIT License
1.54k stars 339 forks source link

How would you train for BW extension? #397

Open kelseyjd opened 1 year ago

kelseyjd commented 1 year ago

I'm interested in training to convert 24 kHz mel spectrograms to 48 kHz waveforms (like HIFI-GAN2). Might not work without changing the architecture, but that's ok. How would you modify the config files to do this? I've already run the recipe through stage 1 to extract features with downsampled VCTK. Now I'm hesitating on how to modify the generator parameters to produce 2x length waveform with the HIFI gan config

kan-bayashi commented 1 year ago

You can simply increase upsample scale here. https://github.com/kan-bayashi/ParallelWaveGAN/blob/ffaa99fe77d3b0703e5857177fd9b2ecc18cb0bd/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L38-L39

E.g.,

 upsample_scales: [8, 8, 4, 2]         # Upsampling scales. 
 upsample_kernel_sizes: [16, 16, 8, 4] # Kernel size for upsampling layers.
kelseyjd commented 1 year ago

I see, thank you very much!

On Fri, Mar 3, 2023 at 7:17 PM Tomoki Hayashi @.***> wrote:

You can simply increase upsample scale here.

https://github.com/kan-bayashi/ParallelWaveGAN/blob/ffaa99fe77d3b0703e5857177fd9b2ecc18cb0bd/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L38-L39

E.g.,

upsample_scales: [8, 8, 4, 2] # Upsampling scales. upsample_kernel_sizes: [16, 16, 8, 4] # Kernel size for upsampling layers.

— Reply to this email directly, view it on GitHub https://github.com/kan-bayashi/ParallelWaveGAN/issues/397#issuecomment-1454338078, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZ6JON2SSKSBVATQSKYDHTW2KQ27ANCNFSM6AAAAAAVMWYTAQ . You are receiving this because you authored the thread.Message ID: @.***>