Open LaHaine opened 2 hours ago
Rebuilding openmpi5 against ucx 1.17.0 doesn't change this.
Just tried it and I cannot reproduce it. Can you share a minimal test case? How are you starting your test case? What backend does Open MPI use? I tried it on two nodes with Ethernet connected. How many nodes?
Can you try to downgrade UCX to 1.15 from the update.3.1
directory?
I don't see anything about incompatibilities between 1.15 and 1.17 on the UCX release page.
This is on AlmaLinux 9.4. After updating ucx-ohpc to 1.17.0-320.ohpc.1.1.x86_64, all binaries build wih openmpi5-gnu13 fail like this:
The same binary runs fine with openmpi5-gnu14-ohpc-5.0.5-320.ohpc.2.1.x86_64.