yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
MIT License
466 stars 110 forks source link

Which GAN vocoders shall work with what hyper params? #35

Closed skol101 closed 2 years ago

skol101 commented 2 years ago

Am I understanding it correctly that either ParallelWaveGAN or HifiGAN shall work as vocoders? But those vocoders must be trained with the same params as the pre-trained vocoder that's provided in the demo? I followed this https://github.com/yl4579/StarGANv2-VC/issues/8#issuecomment-914651372to update preprocess.py, but not normalize.py, as, if I understand correctly, generated speakers stats have no use in StarGANv2 VC model.

I've tried first finetuning StyleMelGan from pre-trained VCTK StyleMelgan (https://github.com/kan-bayashi/ParallelWaveGAN). That didn't work out at all.

Then I trained from scratch StyleMelGan for about 225000 steps on the same 20 speakers that are used by StarGANv2VC. Whilst the predictions (generated wavs) by vocoder itself are excellent, when using together with the StarGAN VC model , the results weren't good.

Both vocoder config files use the same params n_mels=80, n_fft=2048, win_length=1200, hop_length=300, though other hyper params are quite different from pretrained model that comes in this Demo.

skol101 commented 2 years ago

@Kreevoz, could you share how you changed preprocess.py in the ParallelWaveGAN?

MMMMichaelzhang commented 2 years ago

@skol101 have you fixed out how to change the preprocess.py in the ParallelWaveGAN?

moh3n9595 commented 1 year ago

Same issue

roinhutovo commented 1 year ago

Same issue. It is necessary to recreate your own training PW configuration @yl4579