How do I make mpirun continue despite node failures?

hatmer commented 2 years ago

Background information

I am implementing a fault-tolerant version of a large software project (ArgoDSM) that relies on MPI for managing nodes.

What version of Open MPI are you using?

v4.1.2

Describe how Open MPI was installed

tarball

Please describe the system on which you are running

Operating system/version: Linux 4.19.0-18-cloud-amd64

Details of the problem

I have a two-node system. I want the individual nodes to continue running after the network link between them is severed.

When I simulate a network failure (by cutting a node off from the network using iptables), mpirun crashes and I get the following error:

Connection to xx.xx.xx.xx closed by remote host.
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
...
* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).

I understand this to mean that mpirun sends a KILL -9 signal when it detects that it cannot reach the remote host. How do I prevent mpirun from terminating? It would be nice if instead of having a KILL -9 signal, I could set MPI_ERRORS_RETURN and deal with the "node unreachable" event as an MPI error.

I am aware of ftmpi, but as far as I can tell it does not prevent mpirun from terminating due to an unreachable node.

rhc54 commented 2 years ago

I'm afraid that will not work - OMPI v4 has no concept of continuing in that situation.

bosilca commented 2 years ago

You will need to install your own version of OMPI 4.x with resilient capabilities as indicated here.

jsquyres commented 2 years ago

Just to clarify: both answers are technically correct. 😉

@rhc54 says that it won't work with the community Open MPI v4.1.x series.
@bosilca and his research team have been working with fault tolerance in MPI (and specifically Open MPI) for years. The URL he cites is how you can incorporate ULFM into Open MPI. For quite a while, their research group hosted a fork of Open MPI that included ULFM. That being said, that fork will soon no longer be necessary because the ULFM support will be included in the upcoming Open MPI v5.0.0 (it isn't released yet, though).

hatmer commented 2 years ago

I installed Open MPI 5.0.0 (configured with the -with-ft=mpi flag) and it works perfectly. Thank you!

open-mpi / ompi