The idist.all_gather() is used to collect the tensor from all the processes even if only the rank 0 needs it. The gather() method would be used but the backend nccl does not support it. See here.
The idea here is to implement the gather() method in idist using all_gather() for nccl (and gather() for others backends). Note that reduce() for gloo on GPU could be implemented using all_reduce() in a similar way.
🚀 Feature
Consider the following piece of code
The
idist.all_gather()
is used to collect the tensor from all the processes even if only the rank 0 needs it. Thegather()
method would be used but the backendnccl
does not support it. See here.The idea here is to implement the
gather()
method inidist
usingall_gather()
fornccl
(andgather()
for others backends). Note thatreduce()
forgloo
on GPU could be implemented usingall_reduce()
in a similar way.It needs tests + docs