openucx / ucc

Unified Collective Communication Library
https://openucx.github.io/ucc/
BSD 3-Clause "New" or "Revised" License
177 stars 85 forks source link

TL/MLX5: one-sided mcast reliability init #980

Closed MamziB closed 6 days ago

MamziB commented 1 month ago

TL/MLX5: one-sided mcast reliability init that will be used for mcast-based allgather

artemry-nv commented 1 month ago

bot:retest

janjust commented 1 month ago

bot:retest

MamziB commented 3 weeks ago

@nsarka @janjust @samnordmann Thanks everyone for the constructive comments. I have pushed all the requested changes. Please take a look and feel free to hit the resolve button if they look good.

MamziB commented 3 weeks ago

/ below data structures are used in async design only /

@samnordmann These options are for algo selection and check if one-sided is enabled. Let me remove the options that are not necessary for this PR.

MamziB commented 3 weeks ago

@samnordmann Thanks for the comments. I added a new commit (a separate commit) with all the new requested changes. It will be easier this way to track what has changed, at the end I will squash all the commits into a single one.

MamziB commented 2 weeks ago

@Sergei-Lebedev Thanks Sergey for the comments. I have resolved all of them.

Sergei-Lebedev commented 1 week ago

@MamziB pls rebase

MamziB commented 1 week ago

@Sergei-Lebedev Thanks for the comments, I have rebased it on top of latest master.