Unbalanced GPU memory usage

marcoamonteiro / pi-GAN

415 stars 76 forks source link

Unbalanced GPU memory usage #22

Open skq-cuhk opened 3 years ago

skq-cuhk commented 3 years ago

Thanks for the great work! I noticed that the GPU load is unbalanced. There are 7 additional processes on GPU0, each requires roughly 500+ Mb of GPU memory. These additional processes are triggered by self._distributed_broadcast_coalesced() in torch.DistributedDataParallel() when instantiating a DDP model. Do you have any idea about balancing the memory requirement on each GPU? Thank you.

Ys-Jung77 commented 2 years ago

i observed the same problem.. do you have any solution?

KeqiangSun commented 2 years ago

i observed the same problem.. do you have any solution?

Adding torch.cuda.set_device(rank) in the beginning of the training function might help.

Ys-Jung77 commented 2 years ago

it works!!! god bless you