Open s417-lama opened 3 years ago
The problem seems to be in MPI_Win_lock_all
, not MPI_Compare_and_swap
. Just out of curiosity: why are you building against UCX and then use osc/rdma? Does it work when running with --mca osc ucx
?
You're right. I got confused, because I was during investigation of segfault in MPI_Compare_and_swap()
with another version of Open MPI (v4.1.1).
Seems like segfault in MPI_Compare_and_swap()
is resolved in the latest version, but another issue arised in MPI_Win_lock_all
.
why are you building against UCX and then use osc/rdma?
This is because I wanted to compare their performance.
Does it work when running with --mca osc ucx?
It did work with --mca osc ucx
, but not with --mca osc rdma
.
Hmmm btl/ofi was used. Will work on ensuring that when osc/rdma is used that btl/uct is selected.
@hjelmn Shouldn't btl/ofi be the btl to use on Omni-Path systems? To enable btl/uct, I have to run with --mca btl_uct_memory_domains all
as otherwise btl/uct bails out.
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
The current master branch: 65ca64f34e486b32be986f28356f8b0d0e3539ac
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
From a git clone, as follows:
UCX v1.10.1 was built from a tarball.
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
When calling
MPI_Compare_and_swap()
in "flat MPI" model, where multiple nodes are used and multiple processes are running on each node, it causes segfault withrdma
osc.Segfault did not occur with a single node or with multiple nodes having one process per node.
Minimum code example to reproduce segfault:
This program first initializes
lock
as 0, and then all processes issueMPI_Compare_and_swap()
tolock
at rank 0. Expected behavior is that only one process getsresult = 0
.Running the above program with 4 processes on 2 nodes:
Output:
Running with
-n 4 -N 1
(one process per node) and-n 4 -N 4
(only one node) did not cause segfault.