rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
https://docs.rapids.ai/api/raft/stable/
Apache License 2.0
763 stars 193 forks source link

[BUG] mpi_comms comm_split does not split NCCL communicator #75

Open seunghwak opened 4 years ago

seunghwak commented 4 years ago

https://github.com/rapidsai/raft/blob/branch-0.16/cpp/include/raft/comms/mpi_comms.hpp#L136

  std::unique_ptr<comms_iface> comm_split(int color, int key) const {
    MPI_Comm new_comm;
    MPI_TRY(MPI_Comm_split(mpi_comm_, color, key, &new_comm));
    return std::unique_ptr<comms_iface>(new mpi_comms(new_comm, true));
  }

mpi_comms uses MPI for P2P and NCCL for collectives, but it split only MPI communicator in comm_split.

github-actions[bot] commented 3 years ago

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 3 years ago

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.