Open Oktai15 opened 3 years ago
https://github.com/jik876/hifi-gan/blob/master/train.py#L259 total batch size is constant regardless of the number of GPUs
@CookiePPP great, thanks! Then I need number of total batch size that was used :)
I found that when there is only ONE speaker in training dataset, when I changed the batch_size = 16 * num(GPU), the result wavform would contains some noise like reverberation, which did not happen when I used TWO or MORE speaker dataset.
I found that when there is only ONE speaker in training dataset, when I changed the batch_size = 16 * num(GPU), the result wavform would contains some noise like reverberation, which did not happen when I used TWO or MORE speaker dataset.
@JohnHerry I ran into the same problem. What do you think is the reason for the single-seaker dataset or the reason for batch_size.
https://github.com/jik876/hifi-gan/blob/master/train.py#L259) total batch size is constant regardless of the number of GPUs
@yyggithub I do not realy know the reason. But I noticed that HifiGAN MelDataset "shuffle" is down on multi-GPU training. On torch distributed multi-GPU training method, traning process on each GPU some what more “stand alone”. I guess if different GPU will always see identical subset of the whole training dataset, The batch_size is bigger, the inconsistent between GPUs get largger.
@yyggithub I do not realy know the reason. But I noticed that HifiGAN MelDataset "shuffle" is down on multi-GPU training. On torch distributed multi-GPU training method, traning process on each GPU some what more “stand alone”. I guess if different GPU will always see identical subset of the whole training dataset, The batch_size is bigger, the inconsistent between GPUs get largger.
Good, Have you tried to use single-speaker data on single GPU will have this problem?
@yyggithub I do not realy know the reason. But I noticed that HifiGAN MelDataset "shuffle" is down on multi-GPU training. On torch distributed multi-GPU training method, traning process on each GPU some what more “stand alone”. I guess if different GPU will always see identical subset of the whole training dataset, The batch_size is bigger, the inconsistent between GPUs get largger.
Good, Have you tried to use single-speaker data on single GPU will have this problem?
Yes, traing with single-speaker dataset on single GPU would be fine.
@jik876 I see batch_size=16 in config, but I want to clarify that batch size was equaled to 16 per GPU, right? And you used 2 V100 for training with this batch size?