Closed zhngaj closed 4 years ago
Thanks for the bug report and for the analysis. We should have not used the aligned access primitive for SSE. I have made a PR (#7957) and added a test case to validate unaligned memory accesses.
Thanks for the fix. With PR #7957, I did not hit the segfault with IMB-EXT.
Going to reopen this to track the cherry-pick of https://github.com/open-mpi/ompi/pull/7957 into v4.1.x, which also has the MPI_OP support using vectorized instructions.
FYI: #7997 filed to cherry-pick this to v4.1.x branch.
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.1.x branch
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.There's no output from
git submodule status
after I checked out v4.1.x branch. I can see its output with master branch though.Please describe the system on which you are running
Details of the problem
Built libfabric master (b01932dfb (HEAD -> master, origin/master, origin/HEAD) Merge pull request #6103 from wzamazon/efa_fix_readmsg)
Built Open MPI v4.1.x branch (bd16024a0b (HEAD -> v4.1.x, origin/v4.1.x) Merge pull request #7946 from gpaulsen/topic/v4.1.x/README_for_SLURM_binding) with libfabric
Built intel-mpi-benchmark 2019 U6 (https://github.com/intel/mpi-benchmarks/tree/IMB-v2019.6) with Open MPI
Ran IMB-EXT Accumulate test with 2 MPI processes, and hit the segfault.
Calling sequence was:
/fsx/SubspaceBenchmarks/spack/opt/spack/linux-amzn2018-x86_64/gcc-4.8.5/intel-mpi-benchmarks-2019.6-tnzpd3z7s4mgsvchsl2ofn2cgw3aonty/bin/IMB-EXT Accumulate -npmin 2 -iter 200
Minimum message length in bytes: 0
Maximum message length in bytes: 4194304
#
MPI_Datatype : MPI_BYTE
MPI_Datatype for reductions : MPI_FLOAT
MPI_Op : MPI_SUM
# #
List of Benchmarks to run:
Accumulate
-----------------------------------------------------------------------------
Benchmarking Accumulate
processes = 2
-----------------------------------------------------------------------------
#
MODE: AGGREGATE
#
bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
[ip-172-31-13-230:12036] Process received signal [ip-172-31-13-230:12036] Signal: Segmentation fault (11) [ip-172-31-13-230:12036] Signal code: (-6) [ip-172-31-13-230:12036] Failing at address: 0x1f400002f04 [ip-172-31-13-230:12036] [ 0] /lib64/libpthread.so.0(+0xf600)[0x7f4b6ba25600] [ip-172-31-13-230:12036] [ 1] /fsx/ompi/install/lib/openmpi/mca_op_avx.so(+0x3e738)[0x7f4b60f23738] [ip-172-31-13-230:12036] [ 2] /fsx/ompi/install/lib/openmpi/mca_osc_rdma.so(+0x9f5d)[0x7f4b51bd1f5d] [ip-172-31-13-230:12036] [ 3] /fsx/ompi/install/lib/openmpi/mca_osc_rdma.so(+0xc637)[0x7f4b51bd4637] [ip-172-31-13-230:12036] [ 4] /fsx/ompi/install/lib/openmpi/mca_osc_rdma.so(+0xca19)[0x7f4b51bd4a19] [ip-172-31-13-230:12036] [ 5] /fsx/ompi/install/lib/openmpi/mca_osc_rdma.so(+0xffbf)[0x7f4b51bd7fbf] [ip-172-31-13-230:12036] [ 6] /fsx/ompi/install/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_accumulate+0x10f)[0x7f4b51bd86ed] [ip-172-31-13-230:12036] [ 7] /fsx/ompi/install/lib/libmpi.so.40(PMPI_Accumulate+0x421)[0x7f4b6c5a8967] [ip-172-31-13-230:12036] [ 8] /fsx/SubspaceBenchmarks/spack/opt/spack/linux-amzn2018-x86_64/gcc-4.8.5/intel-mpi-benchmarks-2019.6-tnzpd3z7s4mgsvchsl2ofn2cgw3aonty/bin/IMB-EXT[0x43da5c] [ip-172-31-13-230:12036] [ 9] /fsx/SubspaceBenchmarks/spack/opt/spack/linux-amzn2018-x86_64/gcc-4.8.5/intel-mpi-benchmarks-2019.6-tnzpd3z7s4mgsvchsl2ofn2cgw3aonty/bin/IMB-EXT[0x42c3aa] [ip-172-31-13-230:12036] [10] /fsx/SubspaceBenchmarks/spack/opt/spack/linux-amzn2018-x86_64/gcc-4.8.5/intel-mpi-benchmarks-2019.6-tnzpd3z7s4mgsvchsl2ofn2cgw3aonty/bin/IMB-EXT[0x432d44] [ip-172-31-13-230:12036] [11] /fsx/SubspaceBenchmarks/spack/opt/spack/linux-amzn2018-x86_64/gcc-4.8.5/intel-mpi-benchmarks-2019.6-tnzpd3z7s4mgsvchsl2ofn2cgw3aonty/bin/IMB-EXT[0x405513] [ip-172-31-13-230:12036] [12] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4b6b66a575] [ip-172-31-13-230:12036] [13] /fsx/SubspaceBenchmarks/spack/opt/spack/linux-amzn2018-x86_64/gcc-4.8.5/intel-mpi-benchmarks-2019.6-tnzpd3z7s4mgsvchsl2ofn2cgw3aonty/bin/IMB-EXT[0x403d29] [ip-172-31-13-230:12036] End of error message
Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.
mpirun noticed that process rank 1 with PID 12036 on node ip-172-31-13-230 exited on signal 11 (Segmentation fault).
0 0x00007f98385f8763 in _mm256_load_ps (__P=0xef70d0) at /usr/lib/gcc/x86_64-amazon-linux/7/include/avxintrin.h:873
873 return (__m256 )__P; Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.180.amzn1.x86_64 libibverbs-28.amzn0-1.amzn1.x86_64 libnl3-3.2.28-4.6.amzn1.x86_64 libpciaccess-0.13.1-4.1.11.amzn1.x86_64 zlib-1.2.8-7.18.amzn1.x86_64 (gdb) bt
0 0x00007f98385f8763 in _mm256_load_ps (__P=0xef70d0) at /usr/lib/gcc/x86_64-amazon-linux/7/include/avxintrin.h:873
1 ompi_op_avx_2buff_add_float_avx512 (_in=0xef70a0, _out=0xef70d0, count=0x7fff324b9b94, dtype=0x7fff324b9b88, module=0xbcfb30)
2 0x00007f983fb80a17 in ompi_op_reduce (op=0x661180, source=0xef70a0, target=0xef70d0, count=8,
3 0x00007f983fb8114a in ompi_osc_base_sndrcv_op (origin=0xef70a0, origin_count=8, origin_dt=0x662e00,
4 0x00007f9824fd4293 in ompi_osc_rdma_gacc_local (source_buffer=0xef70a0, source_count=8,
5 0x00007f9824fd7f71 in ompi_osc_rdma_rget_accumulate_internal (sync=0xee9310, origin_addr=0xef70a0, origin_count=8,
6 0x00007f9824fd86ed in ompi_osc_rdma_accumulate (origin_addr=0xef70a0, origin_count=8,
7 0x00007f983fb2b967 in PMPI_Accumulate (origin_addr=0xef70a0, origin_count=8, origin_datatype=0x662e00,
8 0x000000000043da5c in IMB_accumulate ()
9 0x000000000042c3aa in Bmark_descr::IMB_init_buffers_iter(comm_info, iter_schedule, Bench, cmode, int, int) ()
10 0x0000000000432d44 in OriginalBenchmark<BenchmarkSuite<(benchmark_suite_t)4>, &IMB_accumulate>::run(scope_item const&) ()
11 0x0000000000405513 in main ()
(gdb) f 0
0 0x00007f98385f8763 in _mm256_load_ps (__P=0xef70d0) at /usr/lib/gcc/x86_64-amazon-linux/7/include/avxintrin.h:873
873 return (__m256 )__P; (gdb) p 0xef70d0 % 32 $2 = 16