Open drwootton opened 2 years ago
I got some, maybe most, of them but there are other issues that need a little bit more thinking. There are also few corner cases where one of the processes gets killed by the OOM, and that's something you cannot trap in gdb. I'll push a PR soon for both #10186 and #10187.
Removing v5.0.x label - this will be a main-only change.
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
OpenMPI main branch
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built from current main branch (3/22/22)
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
I ran the set of self-checking tests from ompi-tests-public/collective-big-count with collective components specified as --mca coll_adapt_priority 100 --mca coll adapt,basic,sm,self,inter,libnbc
The following testcases had failures. The remaining testcases were successful:
The tests were compiled by running make in the directory containing the source files
The following environment variables were set for all tests:
The following command failed in a MPI_Allgather call
This command fails with an assert and the following traceback:
The following command failed in a MPI_Allreduce call
The assert and traceback looks similar:
The following command failed with a self-check that detected invalid results then a SIGSEGV
The error message and traceback are:
The following command failed with an assert and traceback similar to test_allreduce_uniform_count except the failing MPI call is MPI_Alltoall:
The following command failed with an error message indicating a self-check failed then double free or storage corruption:
The following command failed with an assert and traceback similar to test_allreduce_uniform_count except the failing MPI call is MPI_Reduce:
The following command failed with a self-check message indicating the testcase generated invalid data