Open dalcinl opened 5 days ago
Have noticed some semi-random errors on PRs, but haven't seen that particular error message before. I suspect that specific error may be indicative of the growing disconnect between the PMIx master branch and the OMPI fork of PRRTE. I've tried to start some discussion over here about it, but due to Supercomputing and holidays it will take some time to address the problem.
The line number indicates that the PMIx submodule isn't current - indeed, a quick glance shows it is far behind the head of the master branch. I can post a PR to update it, just to see if it impacts anything.
However, the overall problem could have nothing to do with PMIx or PRRTE. 🤷♂️ Difficult to say.
Sigh - can't update PMIx as the OMPI PRRTE fork is simply too out-of-sync. 🤷♂️ Not much I can help with, I'm afraid.
FWIW: looking at the OMPI nightly regression tests (their own test suite), it appears that the one-sided tests are uniformly failing in both the main
and v5.0
branches. Seeing the same failure signatures that are being reported elsewhere by Debian.
Interestingly enough, I'm not seeing the failure you are reporting here - but given it is intermittent, that may be simply luck.
Nightly mpi4py tests with ompi@main have been failing from time to time. Test pass after a re-run, so the things is no easily reproducible. The latest failure produces the following output.
Full logs here