Closed abouteiller closed 3 years ago
@rhc54 I believe the observed behavior is a combination of two separate issues:
[saturn:55357] [mpiexec-saturn-55357@0,0] errmgr:dvm: for proc [mpiexec-saturn-55357@0,2] state COMMUNICATION FAILURE
[saturn:55357] [mpiexec-saturn-55357@0,0] Comm failure: daemons terminating - recording daemon [mpiexec-saturn-55357@0,2] as gone
[saturn:55357] [mpiexec-saturn-55357@0,0] Comm failure: 1 routes remain alive
This change is probably from baec91f3 that removed the FORCED_TERMINATE macro, and now goes to inconditional cleanup, even if enable-recovery has been set.
Issue is now resolved in pr #960
Thank you for taking the time to submit an issue!
Background information
when using PRTE with FT (e.g., with Open MPI
mpiexec --with-ft ulfm
), the error message /grpcomm_bmg_module.c:199] PMIx Error: PACK-MISMATCH is issued, and the application is aborted immediately.What version of the PMIx Reference Server are you using? (e.g., v1.0, v2.1, git master @ hash, etc.)
What version of PMIx are you using? (e.g., v1.2.5, v2.0.3, v2.1.0, git branch name and hash, etc.)
Please describe the system on which you are running
Details of the problem
To replicate the issue, one can use Open MPI and ompi-tests-public