junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
22.8k stars 6.29k forks source link

Multi GPU speed ? #1432

Open gabgren opened 2 years ago

gabgren commented 2 years ago

Hi!

I was under the assumption that using multiple gpus for training pix2pix would result in a faster training, but this is not what I am experiencing. In fact I get slower speeds, the best I can do is keeping the s/it more or less the same as with 1 gpu.

For testing, I was using batch_size 8 for single gpu and batch_size 64 for 8 gpus. Tests were done on 8x A6000 and 8x 3090. I have also tested setting norm to instance and batch with no effect.

What am I doing wrong, or getting wrong ? Am I right to expect to train faster with more GPUs or is it that by using multiple gpu_ids i get to train higher resolution ?

Thanks !

taesungp commented 2 years ago

Could you check if the GPU utilization is at 100%? It could be because the data loader does not feed training images fast enough. Another possibility is that the progress in the total number of images used for training is actually faster with more GPUs, but if you are monitoring the number of iterations, it won't be different.

gabgren commented 2 years ago

Looks like its your first theory: it takes a long time feeding the 8 gpus. the actual processing seems to be faster, but is slowed down between iterations. See this comparison between the GPU utilization of 1xA6000 vs 8xA6000: 1gpu 8gpus

How can I speed this up ?

junyanz commented 2 years ago

it might be a data loading issue. You may want to use SSD or other fast file systems.

malinjie-hub commented 2 years ago

I have 4 GPUs and want to use these 4 GPUs for accelerated training at the same time, how can I modify the code? At present, it can only be trained on one GPU, and the training speed is very slow, thank you!

icelandno1 commented 2 years ago

@gabgren I have 4 GPUs and want to use these 4 GPUs for accelerated training at the same time, how can I modify the code? At present, it can only be trained on one GPU, and the training speed is very slow, --gpu_ids 0,1,2,3 does not work,thank you!

junyanz commented 2 years ago

What is your batch_size? By mentioning "does not work", are you referring to (1) the model is only trained on one GPU, or (2) the model is trained on multiple GPUs, but the training speed is not as fast as you expect?

icelandno1 commented 2 years ago

@junyanz batch_size is "4", after the use of --gpu_ids 0,1,2,3, the model is only trained on one GPU

taesungp commented 2 years ago

This could be because of the limitation of nn.DataParallel we use here, which was a common approach when we published the git repo. But it does suffer from suboptimal GPU utilization because the data loading is inefficient. A better way would be utilizing DistributedDataParallel link. We don't plan to support this for now, but if someone could create a PR I'd appreciate it.