simonsobs-uk / data-centre

This tracks the issues in the baseline design of the SO:UK Data Centre at Blackett
https://souk-data-centre.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Warnings when running OpenMPI #2

Closed ickc closed 1 year ago

ickc commented 1 year ago

Copied from email thread:

in running MPI jobs with Open MPI 3, I have the following warnings. Is it normal or was there any configuration problem?

From stderr:

--------------------------------------------------------------------------
[[29587,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
 Host: wn1906370

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
[wn1906370.in.tier2.[hep.manchester.ac.uk](http://hep.manchester.ac.uk/):149808] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics
[wn1906370.in.tier2.[hep.manchester.ac.uk](http://hep.manchester.ac.uk/):149808] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

From stdout:

[1689097992.669727] [wn5914090:142928:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[1689097992.671404] [wn5916090:65908:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[1689097992.696781] [wn1906370:149888:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[1689097992.697059] [wn1906370:149889:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
ickc commented 1 year ago

Copied from email thread:

The minimal reproducible example would be

git clone [git@github.com](mailto:git@github.com):ickc/htcondor_ex.git
cd htcondor_ex
make download
cd examples/mpi-hello-world
condor_submit mpi.ini
tail -f mpi.out mpi.err

Which would emits the 2 errors I copied in the last email. The first error is about the openib interface. This is more of a question if the system have better interconnects available, if not then setting OMPI_MCA_btl_base_warn_component_unused=0 would remove that (as suggested by the message itself.) The other error is more cryptic and is related to shared memory limits.

ickc commented 1 year ago

Copied from email thread:

With some searches, it seems the 2nd error are also related to networking: open-mpi/ompi#6084 which references openucx/ucx#3080 and openucx/ucx#3084 Also see https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/ib-and-roce.html (which is the doc of OpenMPI 5, see https://www.open-mpi.org/faq/?category=building for OpenMPI 3). Probably the OpenMPI on the system is built with --with-ucx=...

So probably both errors are related to how the network is configured at Blackett and how to config the MPI to get the most performance from the hardwares.

ickc commented 1 year ago

Copied from email thread:

From Robert:

the mpi hello world seems to be working for me now. I still get the shmget errors, but also the hello world messages:

-bash-4.2$ cat mpi.out
[1689171404.032094] [wn5917090:121332:0]            sys.c:618  UCX 
ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: 
Operation not permitted, please check shared memory limits by 'ipcs -l'
[1689171404.033081] [wn5916340:101461:0]            sys.c:618  UCX 
ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: 
Operation not permitted, please check shared memory limits by 'ipcs -l'
[1689171404.061509] [wn5914340:195243:0]            sys.c:618  UCX 
ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: 
Operation not permitted, please check shared memory limits by 'ipcs -l'
[1689171404.062160] [wn5914340:195244:0]            sys.c:618  UCX 
ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: 
Operation not permitted, please check shared memory limits by 'ipcs -l'
Hello world from processor wn5914340.in.tier2.[hep.manchester.ac.uk](https://github.com/simonsobs-uk/data-centre-issue-tracker/issues/hep.manchester.ac.uk), rank 
3 out of 4 processors
Hello world from processor wn5914340.in.tier2.[hep.manchester.ac.uk](https://github.com/simonsobs-uk/data-centre-issue-tracker/issues/hep.manchester.ac.uk), rank 
0 out of 4 processors
Hello world from processor wn5916340.in.tier2.[hep.manchester.ac.uk](https://github.com/simonsobs-uk/data-centre-issue-tracker/issues/hep.manchester.ac.uk), rank 
1 out of 4 processors
Hello world from processor wn5917090.in.tier2.[hep.manchester.ac.uk](https://github.com/simonsobs-uk/data-centre-issue-tracker/issues/hep.manchester.ac.uk), rank 
2 out of 4 processors

Here's what I did: 1) cp /usr/share/doc/condor-9.0.17/examples/openmpiscript $HOME 2) add after line 59 (#MPDIR=/usr/lib64/openmpi) . /etc/profile.d/modules.sh module load mpi/openmpi3-x86_64 MPDIR=$MPI_HOME 3) change mpi.ini to point to $HOME/openmpiscript instead of ../../openmpiscript 4) submit job

The openmpiscript script in the github repo is old, so might not work properly any more. Step 2 is required because the condor configuration on the WNs is still pointing to openmpi 1.x. I think it's better to use the module than to rely on the configuration on the WNs. You could also try to load the mpich2 module there. I haven't tried it interactively yet, this will have to wait until next week.

ickc commented 1 year ago

Copied from email thread:

The warnings/errors above are not about not being able to run MPI jobs. I have a modified version of that script that works: https://github.com/ickc/htcondor_ex/blob/main/src/openmpiscript The errors and warnings come along side a successfully run MPI job. Both are related to network configuration. Copied from an earlier email:

The first error is about the openib interface. This is more of a question if the system have better interconnects available, if not then setting OMPI_MCA_btl_base_warn_component_unused=0 would remove that (as suggested by the message itself.) The other error is more cryptic and is related to shared memory limits.

With some searches, it seems the 2nd error are also related to networking: https://github.com/open-mpi/ompi/issues/6084%C2%A0which?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc references https://github.com/openucx/ucx/pull/3080%C2%A0and%C2%A0https://github.com/openucx/ucx/issues/3084 Also see https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/ib-and-roce.html which is the doc of OpenMPI 5, see https://www.open-mpi.org/faq/?category=building for OpenMPI 3). Probably the OpenMPI on the system is built with --with-ucx=...

So probably both errors are related to how the network is configured at Blackett and how to config the MPI to get the most performance from the hardwares.

rwf14f commented 1 year ago

The problem with the shared memory error is that ucx is trying to use kernel hugepages which requires elevated permissions, see https://github.com/openucx/ucx/issues/3023. The merge mentioned in the ticket fixes this by hiding the error if it's due to permission problems (EPERM), but the version available in C7 doesn't have this patch. I wonder if there's a way of disabling the relevant ucx module on the command line or with environment variables.

ickc commented 1 year ago

@rwf14f, thanks! I think we can safely ignore the 2nd error (which is more like a warning) for now as it is already fixed in newer versions.

How about the 1st warning? It is related to the available network interface. If there's no other available at Blackett then we can just set export OMPI_MCA_btl_base_warn_component_unused=0.

rwf14f commented 1 year ago

OpenMPI tries all available communication modules to check what's available on a machine. The first warning is caused by the openib module which manages communication / transfers via Infiniband networks. As we do not have any Infiniband devices you get that warning. Either ignore it or disable it with the environment variable. Afaik, OpenMPI has options to explicitly tell it which modules it should try or not try. This would be another way of avoiding those warnings.

ickc commented 1 year ago

Thanks! For now I'm disabling the warning. I think having some sort of documentation on Blackett would be beneficial to resolve this kind of problems. c.f. #6.