openucx / ucc

Unified Collective Communication Library
https://openucx.github.io/ucc/
BSD 3-Clause "New" or "Revised" License
195 stars 96 forks source link

TL/MLX5: various optimizations #1012

Open samnordmann opened 1 month ago

samnordmann commented 1 month ago

What

This PR contains various optimizations for TL/MLX5/a2a. In order of importance/relevance: 1) support rectangular blocks 2) other configurations in how we post the WQEs:

We might want to merge this PR as is, or to divide it into several smaller ones. But this branch is at least a pointer for a working version, that can be used as is for performance experimentation.

TODO:

One important optimization that is yet to be implemented is to support using several NICs. So far, our algorithm only uses one NIC.

cc @lappazos @x41lakazam