Open skq-cuhk opened 3 years ago
i observed the same problem.. do you have any solution?
i observed the same problem.. do you have any solution?
Adding torch.cuda.set_device(rank)
in the beginning of the training function might help.
it works!!! god bless you
Thanks for the great work! I noticed that the GPU load is unbalanced. There are 7 additional processes on GPU0, each requires roughly 500+ Mb of GPU memory. These additional processes are triggered by self._distributed_broadcast_coalesced() in torch.DistributedDataParallel() when instantiating a DDP model. Do you have any idea about balancing the memory requirement on each GPU? Thank you.