open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.15k stars 859 forks source link

PMIX ERROR: NOT-FOUND in file server/pmix_server.c at line 237 #9589

Open pankajd-57 opened 3 years ago

pankajd-57 commented 3 years ago

Thank you for taking the time to submit an issue!

Background information


After doing srun and getting shell of the compute node, when i execute

mpirun -np 2 /bin/hostname, I get below PMI error :

A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
PMIX stopped checking at the first component that it did not find.

Host:      compute9
Framework: psec
Component: munge
--------------------------------------------------------------------------

--------------------------------------------------------------------------
It looks like pmix_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during pmix_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
PMIX developer):

  pmix_psec_base_open failed
  --> Returned value -46 instead of PMIX_SUCCESS
--------------------------------------------------------------------------

[compute9:1936444] PMIX ERROR: NOT-FOUND in file server/pmix_server.c at line 237

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

openmpi version : 4.1.2a1. Slurm 20.11.7 compiled with pmix-3.2.2

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

It has come as part of nvidia hpcx-2.9.0

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running


Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block like below:

shell$ mpirun -np 2 ./hello_world
jsquyres commented 3 years ago

@janjust This is a question about nvidia hpcx-2.9.0.

lahwaacz commented 2 years ago

I have noticed this on Arch Linux too. It seems to be a problem with OpenMPI's internal PMIX library, building OpenMPI with external PMIX installation (./configure --with-pmix=/usr ...) solved it for me.