openucx / ucc

Unified Collective Communication Library
https://openucx.github.io/ucc/
BSD 3-Clause "New" or "Revised" License
177 stars 85 forks source link

TL/MLX5: complete the mcast init #900

Closed MamziB closed 4 months ago

MamziB commented 5 months ago

adding the states regarding the team creation and mcast group join

MamziB commented 5 months ago

Thanks, @samnordmann for the comments. They are all resolved. @Sergei-Lebedev do you have any comments on this PR?

MamziB commented 5 months ago

@samnordmann @Sergei-Lebedev Thank you, guys, for the comments. I resolved all of them.

MamziB commented 5 months ago

@Sergei-Lebedev @samnordmann Thanks guys for the commets. Can you please merge this PR? It has some important fixes that we need for the hpcx release. Thanks

MamziB commented 4 months ago

@Sergei-Lebedev @samnordmann I have addressed the rest of the comments. Please let me know if you have more comments.

samnordmann commented 4 months ago

The recently merged PR #921 introduced a small bug here https://github.com/openucx/ucc/blob/c13d26ce479ab919eae80c8934c82a7f833d0c02/src/components/tl/mlx5/tl_mlx5_context.c#L55

The line cleaning up rcache should be removed. Can you fix that please?

MamziB commented 4 months ago

@samnordmann @Sergei-Lebedev I did not see any Finalize issue after disabling the mcast flag. So, the PR is now ready. FYI, I tested it on HPCAC on multiple node with mcast flags ON/OFF and it was passed.

MamziB commented 4 months ago

@Sergei-Lebedev I resolved the merge conflict with master