In PR #84 we are adding support for NCCL TL. If UCC was built with NCCL support TL NCCL might be selected by CLs for CUDA collectives i.e. when both source and destination buffers are of memory type CUDA. However there are some known limitations when NCCL is used such as launching multiple collectives on different streams concurrently. Therefore users are encouraged to follow NCCL guidelines to avoid potential deadlocks. From UCC perspective it means that if multiple teams are created and NCCL TL is used then user should not post CUDA collectives to different teams at the same time.
In PR #84 we are adding support for NCCL TL. If UCC was built with NCCL support TL NCCL might be selected by CLs for CUDA collectives i.e. when both source and destination buffers are of memory type CUDA. However there are some known limitations when NCCL is used such as launching multiple collectives on different streams concurrently. Therefore users are encouraged to follow NCCL guidelines to avoid potential deadlocks. From UCC perspective it means that if multiple teams are created and NCCL TL is used then user should not post CUDA collectives to different teams at the same time.