Open rafelamer opened 2 months ago
Is mpi02
on a shared NFS volume? It would be helpful to double check the linking
ldd mpi02
I don't know if it is relevant, in Fedora 40 the openmpi library is linked to libfabric
We can rule out libfabric with additional mca parameters
mpirun -np 16 --mca pml ob1 --mca btl tcp,self --hostfile ~/hosts ./mpi02
This prevents libfabric from being used.
Hi,
with this command
mpirun -np 16 --mca pml ob1 --mca btl tcp,self --hostfile ~/hosts ./mpi02
it works fine. So, it seems that the problem is related to libfabric.
Thanks, Rafel Amer
Thanks for checking. Just to clarify, do you intend to use libfabric at all?
I wonder how libfabric is configured on your system - we can move the discussion to the libfabric community if you desire so.
$ dnf list installed | grep libfabric
$ dnf info <libfabric package name>
OK, I will subscribe to the Libfabric-users mailing list and then, I will make a post.
Best regards, Rafel Amer
The libfabric community would need more information to investigate the issue.
As a starter, you can turn on the relevant verbose configurations in mpirun
--mca btl_ofi_verbose 1 -x FI_LOG_LEVEL=info
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
The version of openmpi is 5.0.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
It was installed on Fedora 40 hosts with the command dnf install openmpi openmpi-devel
I don't know if it is relevant, in Fedora 40 the openmpi library is linked to libfabric
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
I cannot run an mpi program on a 3-node cluster with ip addresses 195.201.223.246, 162.55.213.49 and 88.198.157.233 When I run
I get errors of the form
The contents of the hosts file are
Best regards, Rafel Amer