Slow training on multiple GPUs

rosinality / style-based-gan-pytorch

Implementation A Style-Based Generator Architecture for Generative Adversarial Networks in PyTorch

Other

1.1k stars 232 forks source link

Slow training on multiple GPUs #78

Closed albusdemens closed 4 years ago

albusdemens commented 4 years ago

When I use multiple GPUs instead of one, the training time reduces by 80% (see below). Is there a way to fix this?

The issue has previously been reported (and fixed), but changing num_workers doesn't fix the issue.

Training times: One GPU: 10.71 it/s Four GPUs: 1.13 it/s

CUDA_VISIBLE_DEVICES=4 python train.py --mixing --loss wgan-gp ./Maps_512px_high/
Size: 8; G: 29.625; D: -16.795; Grad: 1.407; Alpha: 1.00000:   0%| | 132/3000000 [00:12<77:49:10, 10.71it]

rosinality commented 4 years ago

This implementation is not very efficient in multi gpu settings, but I don't know why slowdown occurs. Maybe at resolution 8 model is quite small, so communication cost dominates training. Could you test at a larger resolutions?

albusdemens commented 4 years ago

OK. By the way, congratulations for your work - it's very well done!

I benchmarked the speed starting at different resolutions and this is what I got:

init_size=16 --> 1 GPU = 6it/s, 4 GPUs = 1.4 it/s
init_size=32 --> 1 GPU = 2 it/s, 4 GPUs = 1.5 it/s
init_size = 64 --> 1 GPU = 1.4 it/s, 4 GPUs = 1.75 it/s

So it looks like using multiple GPUs is convenient only when starting from a large size, which on the other hand goes against the StyleGAN philosophy, if I understood correctly (this might prevent the formation of nice large-scale features). Another solution could be to reduce the batch size. I'll try that as soon as I have time.

rosinality commented 4 years ago

Hmm I will look at it. By the way, I recommend StyleGAN 2 if you don't need progressive training.

albusdemens commented 4 years ago

Thanks. Yes I plan to give a try to StyleGAN2 too (I still need to properly study the paper, ops). Another thing: where do you define which latent vector(s) to use?

rosinality commented 4 years ago

You can set mixing_range to specify the range of layers that will use the second latent vector.

albusdemens commented 4 years ago

All clear. Is the first latent vector explicitly defined somewhere in the code?

On Thu, Jan 9, 2020 at 3:13 PM Kim Seonghyeon notifications@github.com wrote:

You can set mixing_range to specify the range of layers that will use the second latent vector.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rosinality/style-based-gan-pytorch/issues/78?email_source=notifications&email_token=AACDP23VYK3BFFZUPL4AH6DQ44WKDA5CNFSM4KEWBU62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIQNZRA#issuecomment-572579012, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACDP2Y4WYNSINVWLLS53TLQ44WKDANCNFSM4KEWBU6Q .

rosinality commented 4 years ago

StyledGenerator can take list of vectors, so you can pass list of noises.