Open stephanie-wang opened 1 month ago
What is the different between this CPU-based NCCL communicator and this mock nccl in test? Or this is not for the test but fall back to CPU/shared memory is NCCL is not avaliable?
Assigned to @tfsingh and @anyadontfly. I think this is mainly for tests. For starters, we should work on all-reduce first. Here's a good picture to explain collectives.
Actually it would be good to make this work for non-testing purposes, so that users can debug DAGs with collective ops on CPU.
Commenting for assignment
Commenting for assignment
who's going to take this task?
who's going to take this task?
@tfsingh and @anyadontfly are working on this.
Description
For development and debugging, it's useful to be able to run compiled graphs that contain NCCL transport hints using a CPU-based communicator. The communicator could use the Ray object store / a Ray actor to perform p2p and collective ops.
Use case
No response