TL/NCCL: lazy init nccl comm

openucx / ucc

Unified Collective Communication Library

https://openucx.github.io/ucc/

BSD 3-Clause "New" or "Revised" License

177 stars 85 forks source link

TL/NCCL: lazy init nccl comm #851

Closed Sergei-Lebedev closed 6 months ago

Sergei-Lebedev commented 9 months ago

What

Lazily initialize TL NCCL on first CUDA collective.

Why ?

Both NCCL and CUDA require CUDA devices to be set before team create. In MPI workloads it's not always possible since MPI_Init creates UCC team and to set device we need to know rank and local rank.

replaces #758