yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
MIT License
479 stars 107 forks source link

how to fine-tune model #43

Closed MMMMichaelzhang closed 2 years ago

MMMMichaelzhang commented 2 years ago

When I fine-tuned the model, I had 20 speakers and the model was epoch_00300.pth, now I want to add 1 person, how should I set up? I changed the pretrained_model in config.yml, and then num_domains=21? can you tell me how to do it,thanks

yl4579 commented 2 years ago

You can just change the number of domains and fine-tune the model. It will add one more projection to the style encoder and mapping network while keeping the original ones, and the same for the discriminatory. The generator is independent of the number of domains.

MMMMichaelzhang commented 2 years ago

When I fine-tune the model,I add 1 speaker,then I change num_domains=1,I got error. RuntimeError: output with shape [1, 512, 1, 1] doesn't match the broadcast shape [32, 512, 1, @yl4579

yl4579 commented 2 years ago

It has nothing to do with the added speaker, it just happens that the size of your data mod the batch size is 1, see #42

MMMMichaelzhang commented 2 years ago

what should i do to solve this problem? @yl4579

MMMMichaelzhang commented 1 year ago

It has nothing to do with the added speaker, it just happens that the size of your data mod the batch size is 1, see #42

I set my data to be exactly a multiple of the batch size,and set num_domains=1,but still got nan.@yl4579 train/real : 0.4016 train/fake : 0.3439 train/reg : 0.0005 train/real_adv_cls: nan train/con_reg : 0.0278 train/adv : 1.5941 train/sty : 0.2312 train/ds : 0.0003 train/cyc : 1.0075 train/norm : 2.6099 train/asr : 0.0490 train/f0 : 0.2077 train/adv_cls : nan eval/real : 0.6233 eval/fake : 0.6014 eval/reg : 0.0000 eval/real_adv_cls: nan eval/con_reg : 0.0000 eval/adv : 0.7952 eval/sty : 0.1543 eval/ds : 0.0001 eval/cyc : 0.9594 eval/norm : 1.9790 eval/asr : 0.0442 eval/f0 : 0.3179 eval/adv_cls : nan