Open drew-parsons opened 2 years ago
Following the discussion at https://github.com/open-mpi/ompi/issues/7813#issuecomment-644823020 , I find that my tests pass (both armci-mpi and nwchem) if I set the environment variable
OMPI_MCA_osc=ucx
(my system does not recognize OMPI_MCA_pml)
Is it expected that this setting should be required? If so, then it's just a runtime configuration issue, not bug.
Worth also pointing out, however, even though nwchem/openmpi does run on 2 nodes with OMPI_MCA_osc=ucx
, it's unusable.
A test case finishes on 1 node (16 cpu) in 4000 sec, around 1 hr. Running on 2 nodes (2×16 processes), I gave up and killed it after more than 122292.3 sec, nearly 2 days.
nwchem/armci-mpi built with mpich runs the same job over 2 nodes in 2800 sec.
@drew-parsons Thanks for the report! So it seems that there are multiple issues here: the non-ucx OSC component (osc/pt2pt I guess) failing to allocate the window and the UCX OSC either showing poor performance or getting stuck. I'm not sure why the osc/pt2pt component failed to allocate the window, running with --mca osc_base_verbose 100
could tell us more. Also, could you try the Open MPI 5.0 release branch per-chance? There have been plenty of improvements to the implementations in the upcoming release.
I tried running mpirun with --mca osc_base_verbose 100
(without setting OMPI_MCA_osc=ucx), but it's not providing any more information. Just the same
[31] ARMCI assert fail in gmr_create() [src/gmr.c:109]: "alloc_slices[alloc_me].base != NULL"
[31] Backtrace:
[31] 10 - nwchem(+0x2836605) [0x55591b936605]
[31] 9 - nwchem(+0x282cc1c) [0x55591b92cc1c]
[31] 8 - nwchem(+0x282c358) [0x55591b92c358]
...
Time constraints make it difficult to test v5 for the time being.
@drew-parsons (most) debugging code is stripped out unless Open MPI is configure'd with --enable-debug
.
So I am afraid you have to manually rebuild Open MPI (and then your app since debug/non debug Open MPI builds might not be ABI compatible) in order to get useful logs.
Meanwhile, what if you
mpirun --mca pml ucx ...
or
mpirun --mca pml ob1 ...
@devreal isn't it a known issue where osc/pt2pt
use btl/openib
that might not be working?
The pml options don't shed more light, unfortunately (without yet rebuilding openmpi):
$ mpirun.openmpi --mca pml ucx -H host-2:1,host-3:1 -n 2 tests/contrib/non-blocking/simple
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.
This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.
Host: host-2
Framework: pml
--------------------------------------------------------------------------
[host-2:43451] PML ucx cannot be selected
[host-3:26169] PML ucx cannot be selected
[host-2:43446] 1 more process has sent help message help-mca-base.txt / find-available:none found
[host-2:43446] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
or
$ mpirun.openmpi --mca pml ob1 -H host-2:1,host-3:1 -n 2 tests/contrib/non-blocking/simple
[host-2:43460] *** An error occurred in MPI_Win_allocate
[host-2:43460] *** reported by process [2976317441,0]
[host-2:43460] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
[host-2:43460] *** MPI_ERR_WIN: invalid window
[host-2:43460] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[host-2:43460] *** and potentially your MPI job)
[host-2:43455] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[host-2:43455] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
This is very puzzling ...
the osc/ucx
component is there and usable (since OMPI_MCA_osc=ucx
does not cause an error), but pml/ucx
is either absent or unusable.
Can you
mpirun --mca pml_base_verbose 10 --mca pml ucx ...
and see it is sheds some light on why pml/ucx
cannot be used?
Background information
What version of Open MPI
4.1.2
Describe how Open MPI was installed
Debian packages 4.1.2-1 (official .deb for debian testing), https://packages.debian.org/bookworm/libopenmpi-dev
Please describe the system on which you are running
Also using
Details of the problem
armci-mpi is currently failing to run across multiple nodes when built with OpenMPI. It runs fine on one node. The nodes form a cluster managed by openstack. 16 cpu per node.
The problem was reported for armci-mpi at https://github.com/pmodels/armci-mpi/issues/33 . armci-mpi does not fail when built with mpich, so armci-mpi developers attribute the error to a bug in OpenMPI RMA support.
armci-mpi tests pass when run on 1 node (with
MPIEXEC = mpiexec -n 2
). Most of the tests fail when run across two nodes (e.g. withMPIEXEC="mpiexec -H host-1:1,host-2:1 -n 2"
).Running one of the failing tests manually (with or without
ARMCI_USE_WIN_ALLOCATE
) gives errorsor
Additionally, when an
ARMCI_VERBOSE=1
environment variable is used, the test freezes after reporting configuration values, before reaching the crash point.Is it feasible to include armci-mpi in OpenMPI CI testing, testing across 2 nodes ?
The problem was originally detected in the Debian build of nwchem (7.0.2-1 in debian testing), which uses armci-mpi Testing against the sample water script at https://nwchemgit.github.io/Sample.html, the nwchem error message is:
I've tried a fresh rebuild of armci-mpi, ga and nwchem against openmpi 4.1.2, but the failure is pervasive. nwchem runs successfully over multiple nodes when built against mpich.