Open JohnLLLL opened 4 years ago
Note that we are actually deprecating single-process-multiple-device mode from DDP and plan to only support single-process-single-device: https://github.com/pytorch/pytorch/issues/47012.
I don't think gather and scatter are called anymore after the deprecation. @SciPioneer We probably should remove gather and scatter as well since they are unused?
Although, note that a lot of heavy lifting for DDP actually happens in the c10d reducer where there is a lot of CUDA dependency (ex: https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/reducer.cpp#L592). AMD ROCm got around this by basically mimicking CUDA APIs using hipify (see https://github.com/pytorch/pytorch/blob/c371542efc31b1abfe6f388042aa3ab0cef935f2/c10/cuda/README.md and https://github.com/pytorch/pytorch/blob/ba694520e5004b74b575614f9d7f86a26436d61b/tools/amd_build/build_amd.py)
I think the only way to support this would be to make the c10d reducer device agnostic cc @zhaojuanmao @rohan-varma
this may be something to be considered in composable DDP, cc @mrshenli @SciPioneer
Hi,
I am investigating to extend the DistributedDataParallel to other accelerator devices than CUDA devices. Not only to support single-process-single-device but also to support the single-process-multiple-devices and multple-processes-multiple-devices.
There are a lot of CUDA dependency in the DistributedDataParallel.
My question is:
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23