Closed yifuwang closed 6 months ago
Hi @yifuwang!
Is this change something that you would recommend in general? In every resource online set_device
has been put after init_process_group()
:
Could you elaborate why is this necessary with ENABLE_INTRA_NODE_COMM
and what are the differences (if any) of setting the device before or after?
Thank you!
Hey @carmocca,
Is this change something that you would recommend in general?
Without ENABLE_INTRA_NODE_COMM
, I don't think it matters so long as you set the correct device before the first collective. There are instances of set_device
before init_process_group
in the links you posted (e.g. https://pytorch.org/docs/stable/distributed.html#launch-utility and https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel).
Could you elaborate why is this necessary with ENABLE_INTRA_NODE_COMM
Techniquely it's not a hard requirement. It's just that the feature is still new and experimental, and we're still figuring out the UX. Curious if this constraint is causing issues in your project aside from inconvenience. Thanks
Let me run Lightning's CI with the order changed to see if any issues pop up.
To leverage the low latency intra-node comm in c10d (https://github.com/pytorch/pytorch/pull/114001),
torch.cuda.set_device()
needs to be invoked beforeinit_process_group()
.