mpiwg-abi / abi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
0 stars 0 forks source link

discuss launchers (mpirun and mpiexec) #6

Open jeffhammond opened 1 year ago

jeffhammond commented 1 year ago

Problem

There is no guarantee of compatibility across launchers (e.g. mpirun).

Proposal

We should not try to solve this problem, because it can be solved without additional specification.

Existing practice allows for Slurm, PBS, etc. to launch MPI programs compiled with either MPICH or Open-MPI.

The included launchers from these libraries do not interoperate, but it is straightforward for a third-party tool to solve this, by wrapping their existing launchers.

Changes to the Text

Impact on Implementations

No additional work is required, since existing third-party launchers are supported by MPICH and Open-MPI.

Impact on Users

Some users may complain if we do not solve this thoroughly.

References and Pull Requests

https://github.com/jeffhammond/blog/blob/main/MPI_Needs_ABI_Part_3.md

jedbrown commented 1 year ago

For users who expect to use mpirun or mpiexec, a hack is to figure out what launcher the program expects and then invoke it. In this design, mpiexec can be a shell script that calls strings or some other introspection method on the binary and figures out if it's MPICH or Open-MPI or Intel MPI or MVAPICH2, and then calls the implementation specific mpiexec. This is not an elegant method but it probably works for a lot of users, and isn't any worse than the mess we have right now.

I think that was written in a different context, but if we have a standard ABI, then there will be no strings and you can run the binaries with any library (assuming dynamically linking). Using a Hydra launcher would presumably ensure that the MPICH library is used, and similarly for the ORTE launcher.

Static linking is another matter, rendering ABI moot. Of course it would be ideal if Hydra and ORTE launchers could settle on a standard protocol that resource managers use (PMIx or whatever) to talk to the executable. I agree that's out of scope here.

gonzalobg commented 1 year ago

We should not try to solve this problem, because it can be solved without additional specification.

+1. We can always try to solve this later, if this turns out to be a problem.

jeffhammond commented 1 year ago

For users who expect to use mpirun or mpiexec, a hack is to figure out what launcher the program expects and then invoke it. In this design, mpiexec can be a shell script that calls strings or some other introspection method on the binary and figures out if it's MPICH or Open-MPI or Intel MPI or MVAPICH2, and then calls the implementation specific mpiexec. This is not an elegant method but it probably works for a lot of users, and isn't any worse than the mess we have right now.

I think that was written in a different context, but if we have a standard ABI, then there will be no strings and you can run the binaries with any library (assuming dynamically linking). Using a Hydra launcher would presumably ensure that the MPICH library is used, and similarly for the ORTE launcher.

That's an interesting way to look at it (and I like it). It's a nice situation if users understand that the launcher prescribes the implementation to be used, since that means we don't have to solve the universal launcher problem.

The place where things get interesting is singleton initialization, where the application is started without a launcher and then spawns processes. Today, this does not always work become some implementations can't create multiple processes without environment variables being set. This is a solvable problem but insufficient priority for most implementations to care.

jedbrown commented 1 year ago

Regarding singletons, I think laptop installations would have a default (managed by apt alternatives, modules, and similar) and resource managers would select (at sbatch level) based on modules or explicit parameters. I anticipate this environment management being uniformly easier than current practice.