open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.13k stars 858 forks source link

lower coll accelerator priority #12637

Open bosilca opened 3 months ago

bosilca commented 3 months ago

The accelerator collective module (which allocates and moves the data onto the host in order to complete collective communications) has a priority higher than some collective modules that do natively support CUDA/ROCM (such as UCC). This leads the terrible performance for most users, for as long as they don't manually exclude the accelerator collective (via --mca coll ^accelerator).

This is definitively not very user-friendly, we need to find a way to prevent the accelerator framework from staying in the way of collective components that handle accelerator buffers.