sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
244 stars 58 forks source link

RFE: support ULFM2 mpiexec in cafrun #623

Open nathanweeks opened 5 years ago

nathanweeks commented 5 years ago

ULFM2 is a fork of Open-MPI that implements ULFM-based fault tolerance.

When OpenCoarrays 2.4.0 is configured with CAF_ENABLE_FAILED_IMAGES=TRUE, the cafrun wrapper script adds the --disable-auto-cleanup option to mpiexec to allow (an MPICH-based) MPI to continue execution in the event of an MPI process failure. If the user doesn't want fault tolerance, the user can specify the (MPICH-mpiexec-specific) --reenable-auto-cleanup option.

The ULFM2 mpiexec has neither of these options; rather, the equivalent of --disable-auto-cleanup is assumed by default. Fault tolerance can be disabled with the --disable-recovery mpiexec option.

It would be beneficial if cafrun could accommodate the ULFM2 mpiexec syntax. One possible approach would be to change the --reenable-auto-cleanup cafrun option to something more generic & descriptive (like --disable-failed-images), and select an appropriate mpiexec option based on the MPI implementation.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.