Open wblgers opened 2 years ago
@wblgers have you seen any improvement in StyleMelGAN at higher sampling rates by changing input channels or kernel size?
conv kernal size
I'm working on it. Since the multi-gpu training does not speed up training as expceted, the training is slow. I'll give out some results when finished.
@wblgers Great - I'm testing increasing in_channels and will respond when I have results as well.
As far as multi-gpu training, I have found that increasing the LR and reducing steps between scheduled LR changes helps speed up training.
My understanding is by using the same config with multi-gpu training, you are only increasing the batch size. Since batch size is bigger, you can increase LR proportional to batch size since the step taken by optimizer should be better due to increased batch size.
I tried multiplying LR by 4 for 8 GPUs, and decreasing number of steps in each scheduler change by 1/2. I'm not sure what the best settings are, though. You may be able to do things 1:1 for fastest training (e.g. 8x LR for 8 GPUs and divide number of steps by 8) but I'm not sure.
Dear professor,
I'd like to train stylemelgan vocoder of 32kHz, here is my config to train a multi-speaker model, now the speaker similarity on VC task is worse than fregan/hifigan. Can you give me some advice to improve quality. Here are two points I want to try: (1)improve in_channels from 128 to 256 ; (2)improve conv kernal size; Thanks !