open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.16k stars 859 forks source link

PR 12101 broke support for PRTE_MCA_ras_base_launch_orted_on_hn #12150

Closed hppritcha closed 11 months ago

hppritcha commented 11 months ago

The sha advance for the 3rd-party/prrte done in PR #12101 pulled in a bug that broke support for PRTE_MCA_ras_base_launch_orted_on_hn when set to 1.

This parameter is important when

The bug is supposed to be fixed in PRRTe master but not yet in release branches used by Open MPI.

This problem is in both main and 5.0.x at the moment.

I'm marking this as critical because for sites using HPE SS11 and not supporting PMIx in SLURM or PALS, there's no alternative to using prrte based launch so currently there's a failure to launch (at least easily) on these platforms. I believe is the case for ORNL systems.

hppritcha commented 11 months ago

see comments in 2f9cabf741bd2d3c75e607a53d06466af0ec134d for more details

rhc54 commented 11 months ago

Are you saying that advancing the submodule pointers does not fix the problem? Or are you just filing this as a reminder to update the pointers before release (which is planned anyway)?

hppritcha commented 11 months ago

Are you saying that advancing the submodule pointers does not fix the problem? Or are you just filing this as a reminder to update the pointers before release (which is planned anyway)?

the later so we (Open MPI) don't forget to advance the shas to pull in the fix once you've committed to prrte release branches.

janjust commented 11 months ago

fixed with #12152