Open DaXor-0 opened 2 weeks ago
This option isn't correct: with-pmi=pmix-install-prefix
should be with-pmix
. The output indicates you picked up some other version of PMIx that doesn't include some of the definitions to be found in upstream PMIx master branch.
I get the same error
this is the config I ran
./configure --with-slurm --disable-sphinx --with-pmix=/clusterfs/apps/openpmix --with-hwloc=/clusterfs/apps/hwloc --with-libevent=/clusterfs/apps/libevent --prefix=/clusterfs/apps/openmpi
And this is the config output
Open MPI configuration:
-----------------------
Version: 5.1.0a1
MPI Standard Version: 3.1
Build MPI C bindings: yes
Build MPI Fortran bindings: no Build MPI Java bindings (experimental): no
Build Open SHMEM support: false (no spml)
Debug build: no
Platform file: (none)
Miscellaneous
-----------------------
Atomics: GCC built-in style atomics
Fault Tolerance support: mpi
HTML docs and man pages: no documentation available
hwloc: external
libevent: external
Open UCC: no
pmix: external
PRRTE: internal
Threading Package: pthreads
Transports
----------------------- Cisco usNIC: no
Intel Omnipath (PSM2): no (not found) Open UCX: no
OpenFabrics OFI Libfabric: no (not found) Portals4: no (not found)
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes
Accelerators
-----------------------
CUDA support: no
Intel ZE support: no ROCm support: no
OMPIO File Systems -----------------------
DDN Infinite Memory Engine: no
Generic Unix FS: yes
IBM Spectrum Scale/GPFS: no (not found)
Lustre: no (not found)
Afraid I cannot help you much - something is quite wrong here. You should not be able to configure with an external hwloc, libevent, and pmix - and then use an internal PRRTE. Configure is supposed to error out on that attempt as all must be either internal or all must be external.
Setting that weirdness aside, I can only tell you that you are not in fact building against a head of the PMIx master branch. I don't know if you incorrectly checked out some other branch, or have some older PMIx install on your system, or...? I only know that PRRTE is looking at an old version of PMIx, which is what is causing the error.
Ok, thanks for the advice.
My hypothesis is that something strange is going on due to the fact that I'm on ARM and something somewhere is breaking for this reason
Doubt that it has anything to do with ARM as many of us (myself included) operate regularly on that hardware. You should check to see if you have another PMIx install somewhere on the system that is causing the confusion. Try building everything internal (instead of using the external libs) and see if that works. Etc.
For a university project I'm trying to build a rasberry pi cluster with slurm.
I've had quite a few issues on trying to run srun with mpi and I've settled to install openmpi from git repo specifying external pmix, hwloc and libevent for pmix/slurm integration.
I'm building openmpi version: 5.1.0a1 on a raspberry pi 5 cluster managed with slurm. (nodes have raspberry pi os lite)
What I've done so far:
HWLOC (v2.11 git clone)---> ./configure --disable-rsmi --prefix=/hwloc-install-prefix make make install
LIBEVENT (latest git clone)---> ./configure --prefix==/libevent-install-prefix make make install
OPENPMIX (latest git clone)---> ./configure --with-slurm --with-libevent=/libevent-install-prefix --with-hwloc=/hwloc-install-prefix --prefix=/pmix-install-prefix make make install
OPENMPI (latest git clone)---> ./configure --disable-sphinx --with-slurm --with-libevent=libevent-install-prefix --with-hwloc=hwloc-install-prefix --with-pmi=pmix-install-prefix --prefix=ompi-prefix make --------> I fail here
(note that I'm disabling sphinx because I've not yet installed a python module on the cluster)
The output of pmix configure correctly indicates slurm support and the paths to external libevent and hwloc. Also the output of ompi configure correctly indicates pmi, libevent and hwloc as external.
When I try to run openmpi make I'm not able to build it for this error: