Open shijin-aws opened 2 years ago
There seems to be multiple issues. The following PR fixed one:
https://github.com/open-mpi/ompi/pull/10462
with this PR, 1sided
pass.
Another PR
https://github.com/open-mpi/ompi/pull/10463
This fixed the segfault of pp_1sided
and halo_1sided_put_alloc_mem
The hang with c_accumulate
with efa turns out to be a bug in libfabric EFA installer. Fix is in https://github.com/ofiwg/libfabric/pull/7829. It will take a while for mtt to ingest the change.
Remaining issue are:
c_get_accumulate_ddt1
and c_get_accumulate_ddt2
c_accumulate
is quite slow, not sure it is normal or not.c_put_dynamic_self/c_get_dynamic_self
hang will be fixed by PR https://github.com/open-mpi/ompi/pull/10473
Remaining issues:
With btl/ofi, mt_1sided
segfault.
With btl/tcp,
1sided
, c_accumulate
, etc) hang.c_get_accumulate_ddt1
and c_get_accumulate_ddt2
segfault.
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v5.0.x branch
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Part
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
There are around 40 ibm test suite failures for ompi v5.0.x with tcp path. Full test report can be found in this mtt report