openucx / ucc

Unified Collective Communication Library
https://openucx.github.io/ucc/
BSD 3-Clause "New" or "Revised" License
177 stars 85 forks source link

TL/MLX5: add librdmacm linkage #905

Closed janjust closed 5 months ago

janjust commented 5 months ago

What

Fix Makefile.am add librdmacm dependencies

Why ?

Fixes several discovered bugs and showstoppers: https://redmine.mellanox.com/issues/3752155 http://hpcweb.lab.mtl.com/hpc/mtr_scrap/users/qa_sharp/scratch/ucc/20240123_172600_774704_13631_r-hpc-gpu03/ https://redmine.mellanox.com/issues/3403021

janjust commented 5 months ago

Done, fwiw, this change existed in the private repo and somehow never made it in during the numerous mcast mergers. I guess we don't have mcast enabled in our CI?

Sergei-Lebedev commented 5 months ago

Done, fwiw, this change existed in the private repo and somehow never made it in during the numerous mcast mergers. I guess we don't have mcast enabled in our CI?

Right, MLX5 CI should work once https://github.com/openucx/ucc/pull/806 is merged

Sergei-Lebedev commented 5 months ago

Done, fwiw, this change existed in the private repo and somehow never made it in during the numerous mcast mergers. I guess we don't have mcast enabled in our CI?

Need to change commit title to make CI happy, not PR title

janjust commented 5 months ago

the force push was to change the title in the commit itself, not just in the PR

Sergei-Lebedev commented 5 months ago

the force push was to change the title in the commit itself, not just in the PR

now it's good, thx

artemry-nv commented 5 months ago

bot:retest

artemry-nv commented 5 months ago

bot:retest

janjust commented 5 months ago

Why does this keep failing? Is it CI, or the patch itself (I can't imagine it being the patch)

artemry-nv commented 5 months ago

Why does this keep failing? Is it CI, or the patch itself (I can't imagine it being the patch)

Infra issues - Jenkins and K8s are under pressure due to release activities.

B-a-S commented 5 months ago

bot:retest

manjugv commented 5 months ago

Can you plz port into v1.3.0 branch? @janjust

janjust commented 5 months ago

https://github.com/openucx/ucc/pull/910