open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.16k stars 859 forks source link

segfault with HDF5 chunked file #7795

Open angainor opened 4 years ago

angainor commented 4 years ago

I am looking at OpenMPI 4.0.3 and HDF5 1.10.6 compiled against it. A user reported segfault in ADIOI_Flatten() when using a chunked dataset, i.e., when the following line is executed:

CALL h5pset_chunk_f(crp_list, 1, dims, ierr)

A simple FORTRAN reproducer is attached (compile with h5pfc ioerror.F90, run with mpirun -np2 ./a.out). The same code works with Intel MPI. Here is the stack:

$ mpirun -np 2 ./a.out 
 myid, numprocs:           0           2
[b2368:169069:0:169069] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 169069) ====
 0 0x0000000000050ba5 ucs_debug_print_backtrace()  /build-result/src/hpcx-v2.6.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.7-x86_64/ucx-v1.8.x/src/ucs/debug/debug.c:625
 1 0x0000000000034278 ADIOI_Flatten()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/adio/common/flatten.c:322
 2 0x0000000000035a6c ADIOI_Flatten_datatype()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/adio/common/flatten.c:166
 3 0x000000000002c2d5 ADIO_Set_view()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/adio/common/ad_set_view.c:52
 4 0x0000000000013f26 mca_io_romio_dist_MPI_File_set_view()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/mpi-io/set_view.c:157
 5 0x000000000000cfb7 mca_io_romio321_file_set_view()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/src/io_romio321_file_open.c:237
 6 0x000000000007246e PMPI_File_set_view()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mpi/c/profile/pfile_set_view.c:80
 7 0x00000000002d34d4 H5FD_mpio_write()  H5FDmpio.c:0
 8 0x000000000011b5fe H5FD_write()  ???:0
 9 0x00000000000fab93 H5F__accum_write()  ???:0
10 0x00000000001f3d7b H5PB_write()  ???:0
11 0x00000000001050fb H5F_block_write()  ???:0
12 0x00000000000b5658 H5D__chunk_allocate()  ???:0
13 0x00000000000c4117 H5D__init_storage()  H5Dint.c:0
14 0x00000000000c964b H5D__alloc_storage()  ???:0
15 0x00000000000d02f5 H5D__layout_oh_create()  ???:0
16 0x00000000000c539c H5D__create()  ???:0
17 0x00000000000d101a H5O__dset_create()  H5Doh.c:0
18 0x00000000001abd53 H5O_obj_create()  ???:0
19 0x00000000001768f7 H5L__link_cb()  H5L.c:0
20 0x000000000014c6e3 H5G__traverse_real.isra.0()  H5Gtraverse.c:0
21 0x000000000014cb86 H5G_traverse()  ???:0
22 0x000000000017431e H5L__create_real.part.0()  H5L.c:0
23 0x0000000000177c36 H5L_link_object()  ???:0
24 0x00000000000c4c6f H5D__create_named()  ???:0
25 0x00000000000a2991 H5Dcreate2()  ???:0
26 0x0000000000034b60 h5dcreate_c()  ???:0
27 0x000000000002afba __h5d_MOD_h5dcreate_f()  ???:0
28 0x000000000040181d MAIN__()  /cluster/home/marcink/ioerror.F90:97
29 0x00000000004019ac main()  /cluster/home/marcink/ioerror.F90:3
30 0x0000000000022545 __libc_start_main()  ???:0
31 0x00000000004013c9 _start()  ???:0
=================================

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fa8569a062f in ???
#1  0x7fa833c3d278 in ADIOI_Flatten
    at adio/common/flatten.c:321
#2  0x7fa833c3ea6b in ADIOI_Flatten_datatype
    at adio/common/flatten.c:166
#3  0x7fa833c352d4 in ADIO_Set_view
    at adio/common/ad_set_view.c:52
#4  0x7fa833c1cf25 in mca_io_romio_dist_MPI_File_set_view
    at mpi-io/set_view.c:157
#5  0x7fa833c15fb6 in mca_io_romio321_file_set_view
    at src/io_romio321_file_open.c:237
#6  0x7fa85778546d in PMPI_File_set_view
    at /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mpi/c/profile/pfile_set_view.c:80
[...]

Could that be an OpenMPI problem, or do you think it is HDF5 that's causing it? I'd appreciate any help! thanks!

ioerror.zip

angainor commented 4 years ago

Just to add some info, I found that the segfault happens when the file is located on the Lustre file system. The same code works fine if the file is stored on a local disk, or on a BeeGFS share.

Can this have something to do with Lustre integration / version / etc? Does anyone have suggestions on how to debug this?

jsquyres commented 4 years ago

I'm afraid I don't know much about HDF. @edgargabriel any insight into this?

Does the same problem happen with OMPIO?

angainor commented 4 years ago

@jsquyres Perfect! thanks, -mca io ompio works :) I'm not up to date here. Is there a substantial differece between that and romio321?

jsquyres commented 4 years ago

ROMIO is an import of MPI-IO functionality from MPICH. Originally, ROMIO was a standalone MPI-IO library written at Argonne (back in the early days of MPI-2 when MPI-IO was new). It eventually got slurped up into MPICH itself. But ever since it was created, ROMIO was slurped up into other MPI implementations too -- such as Open MPI. We've continued to import newer versions of ROMIO from MPICH over the years. I don't remember offhand which version of ROMIO we have, but perhaps it's got a bug in this case.

OMPIO is our own, native MPI-IO implementation -- wholly separate from ROMIO. It was spearheaded by Dr. Edgar Gabriel at U. Houston (i.e., @edgargabriel). OMPIO is Open MPI's default MPI-IO these days, except in a few cases (I don't remember which cases offhand, sorry!).

Put simply: OMPI vs. ROMIO is just another run-time plugin/component decision in Open MPI, just like all the others. 😄 We tend to prefer OMPIO 😉, but we keep ROMIO because of its age, maturity, and simply because some people/apps have a preference and/or established/verified compatibility with it.

angainor commented 4 years ago

@jsquyres Thanks a lot, that's good to know! I run 4.0.3, which seems to use romio321 by default. At least on our system (maybe because of Lustre?)

[login-2.betzy.sigma2.no:05806] io:base:file_select: component available: ompio, priority: 1
[login-2.betzy.sigma2.no:05806] io:base:file_select: component available: romio321, priority: 10

I guess I will simply change that in openmpi-mca-params.conf if you say it should actually be the default.

angainor commented 4 years ago

@jsquyres and some more info: I checked OpenMPI 3.1.4 with romio314, and that works. So it seems it is something with the newer version..

edgargabriel commented 4 years ago

I am not sure I have much to contribute to this discussion, I haven't seen this bug yet with romio321.

Generally speaking, romio is used by default on Lustre file system (that's why it has a higher priority in this case), and ompio basically everywhere else. That being said, ompio does have support for Lustre as well, and we are working on some interesting features that if they work out the way we hope, we can also switch on Lustre to ompio.

hppritcha commented 4 years ago

didn't get fixed in 4.0.4

roblatham00 commented 3 years ago

This sounds like a bug we fixed in ROMIO at some point in the last four years, but I haven't waded through the history to find what might be the fix. I would love to see a romio-332 -- it is only one year old.

hakostra commented 3 years ago

For reference: I just encountered the same error on a system with GPFS filesystem as well, with OpenMPI 4.1.0-rc1.

tjahns commented 3 years ago

Ran into what seems like this bug when trying to build hdf5 1.12.0 on a system with OpenMPI 4.0.5. System has Lustre (2.12.4.1_cray_139_g0763d21) and the backtrace is exactly like above from H5PB_write on. This happens during make check with the testpar/testphdf5 unit test of hdf5, and using

 ../libtool --mode=execute mpirun -mca io ompio -n 6 ./testphdf5

the test finishes successfully.

Unfortunately I lack information to make a debug build of OpenMPI on that system that would exactly match the system version but I'll try to get more information out of the person who installed that package.

edgargabriel commented 3 years ago

@tjahns not sure whether it is relevant for your work or not, but note that ompio is now the default even on Lustre file systems starting from the 4.1.x release. The romio component in Open MPI will also be updated to resolve the issues, but I am not 100% sure on what the status of this effort is.

jsquyres commented 3 years ago

While OMPIO resolves this issue, we haven't finished the ROMIO update yet because we're waiting for some fixes from upstream. See #8371.

roblatham00 commented 3 years ago

Just one more item to resolve: https://github.com/pmodels/mpich/pull/5101 thanks for your patience

gpaulsen commented 3 years ago

Not going to update ROMIO in v4.0.x. Removing label.

bwbarrett commented 2 years ago

Moved the milestone to 5.0 and removed the 4.1.x label; given that OMPIO works for this use case, we're not going to backport significant ROMIO changes into 4.1.x at this point.