open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.18k stars 864 forks source link

MPI Generalized requests failing on 4.1.0 #8402

Closed amckinstry closed 3 years ago

amckinstry commented 3 years ago

This is on Debian unstable, with OpenMPI 4.1.0 in MPI4PY (3.0.3) test suite.

The errors we're seeing are here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=979480

The tests work with openmpi 4.0.5.

In both cases the /etc/openmpi/openmpi-mca-params.conf has:

btl_base_warn_component_unused=0
# Avoid openib an in case applications use fork: see https://github.com/ofiwg/libfabric/issues/6332
# If you wish to use openib and know your application is safe, remove the following:
# Similarly for UCX: https://github.com/open-mpi/ompi/issues/8367
btl = ^uct,openib,ofi
pml = ^ucx
osc = ^ucx,pt2pt

to allow testing on single node systems (This is with oversubscription enabled).

Failed case uses pmix 4.0.0, working case uses 3.2.2

rhc54 commented 3 years ago

Just to be clear: are you saying the OMPI v4.1.0 passes these tests if configured with PMIx v3.2.2, but fails if configured with PMIx v4.0.0? There as a bug in generalized requests that has been fixed on the branch, but that had nothing to do with PMIx (AFAIK).

amckinstry commented 3 years ago

A quick test shows trying 4.1.0 with pmix 3.2.2 doesn't fix it

amckinstry commented 3 years ago

So something has changed in 4.0.5 -> 4.1.0 (or our related config)

jsquyres commented 3 years ago

This was fixed in #8340 (nothing to do with PMIx).

jsquyres commented 3 years ago

To clarify: this was reported in #8340 and fixed in the v4.1.x branch in #8348.

amckinstry commented 3 years ago

This solves the issue for us.

jsquyres commented 3 years ago

@amckinstry Excellent. FYI: we only have one or two more things before we plan to roll a v4.1.1rc. Should be "Real Soon Now".