jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
1.98k stars 507 forks source link

What the batch size was used in paper? #79

Open Oktai15 opened 3 years ago

Oktai15 commented 3 years ago

@jik876 I see batch_size=16 in config, but I want to clarify that batch size was equaled to 16 per GPU, right? And you used 2 V100 for training with this batch size?

CookiePPP commented 3 years ago

https://github.com/jik876/hifi-gan/blob/master/train.py#L259 total batch size is constant regardless of the number of GPUs

Oktai15 commented 3 years ago

@CookiePPP great, thanks! Then I need number of total batch size that was used :)

JohnHerry commented 3 years ago

I found that when there is only ONE speaker in training dataset, when I changed the batch_size = 16 * num(GPU), the result wavform would contains some noise like reverberation, which did not happen when I used TWO or MORE speaker dataset.

yygg678 commented 3 years ago

I found that when there is only ONE speaker in training dataset, when I changed the batch_size = 16 * num(GPU), the result wavform would contains some noise like reverberation, which did not happen when I used TWO or MORE speaker dataset.

@JohnHerry I ran into the same problem. What do you think is the reason for the single-seaker dataset or the reason for batch_size.

https://github.com/jik876/hifi-gan/blob/master/train.py#L259) total batch size is constant regardless of the number of GPUs

JohnHerry commented 3 years ago

@yyggithub I do not realy know the reason. But I noticed that HifiGAN MelDataset "shuffle" is down on multi-GPU training. On torch distributed multi-GPU training method, traning process on each GPU some what more “stand alone”. I guess if different GPU will always see identical subset of the whole training dataset, The batch_size is bigger, the inconsistent between GPUs get largger.

yygg678 commented 3 years ago

@yyggithub I do not realy know the reason. But I noticed that HifiGAN MelDataset "shuffle" is down on multi-GPU training. On torch distributed multi-GPU training method, traning process on each GPU some what more “stand alone”. I guess if different GPU will always see identical subset of the whole training dataset, The batch_size is bigger, the inconsistent between GPUs get largger.

Good, Have you tried to use single-speaker data on single GPU will have this problem?

JohnHerry commented 3 years ago

@yyggithub I do not realy know the reason. But I noticed that HifiGAN MelDataset "shuffle" is down on multi-GPU training. On torch distributed multi-GPU training method, traning process on each GPU some what more “stand alone”. I guess if different GPU will always see identical subset of the whole training dataset, The batch_size is bigger, the inconsistent between GPUs get largger.

Good, Have you tried to use single-speaker data on single GPU will have this problem?

Yes, traing with single-speaker dataset on single GPU would be fine.