ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
274 stars 127 forks source link

Bug: Zero-Sized Reads Should Never Fail #3459

Open ax3l opened 1 year ago

ax3l commented 1 year ago

In ADIOS 2.8.3, requesting to read a zero-size of data from a NULL block segfaults.

#6  0x00002002126f1a78 in void adios2::format::BP4Deserializer::PostDataRead<double>(adios2::core::Variable<double>&, adios2::core::Variable<double>::Info&, adios2::helper::SubStreamBoxInfo const&, bool, unsigned long) () from /autofs/nccs-svm1_home1/huebl/sw/venvs/warpx/lib/python3.8/site-packages/openpmd_api/[openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so](http://openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so/)
#7  0x000020021252bfe8 in void adios2::core::engine::BP4Reader::ReadVariableBlocks<double>(adios2::core::Variable<double>&) ()
   from /autofs/nccs-svm1_home1/huebl/sw/venvs/warpx/lib/python3.8/site-packages/openpmd_api/[openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so](http://openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so/)
#8  0x000020021252624c in adios2::core::engine::BP4Reader::PerformGets() ()
   from /autofs/nccs-svm1_home1/huebl/sw/venvs/warpx/lib/python3.8/site-packages/openpmd_api/[openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so](http://openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so/)
#9  0x000020021235d3d4 in adios2::Engine::PerformGets() () from /autofs/nccs-svm1_home1/huebl/sw/venvs/warpx/lib/python3.8/site-packages/openpmd_api/[openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so](http://openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so/)
#10 0x00002002122afa98 in openPMD::detail::BufferedActions::flush(openPMD::FlushLevel, bool) ()
   from /autofs/nccs-svm1_home1/huebl/sw/venvs/warpx/lib/python3.8/site-packages/openpmd_api/[openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so](http://openpmd_api_cxx.cpython-38-powerpc64le-linux-gnu.so/)
#11 0x00002002122afd84 in openPMD::ADIOS2IOHandlerImpl::flush() ()

This should be made more robust.

attn @pnorbert @guj cc @franzpoeschel

X-Ref:

vicentebolea commented 1 year ago

This has might been resolved in https://github.com/ornladios/ADIOS2/pull/3542/files

anagainaru commented 1 year ago

@vicentebolea That is what I hope as well.

@ax3l could you try adios v2.9 or tell me what code to run to reproduce this?

ax3l commented 1 year ago

Sorry for moving so fast and referencing deeply nested. I was going down a rabbit hole of bugs when I saw this, while patching through 3+ repos.

More a memo to myself: In order to reproduce this, I need to use an MPI-parallel run of WarpX with back-transformed diagnostics that writes no particles from some of those rank due to vacuum. This will spill out a few NULL-sized blocks into ADIOS2 BP files to work-around another bug/API contract issue that does not work for us #3455.

ax3l commented 1 year ago

@pnorbert @vicentebolea @anagainaru generally, shall we add a general test to ADIOS2 for zero-write blocks with and w/o compression? This is a pretty common corner case for us and can easily break :)

X-ref:

ax3l commented 1 year ago

Validation Steps

$ bpls -D ...
...
  double    /data/10/particles/electrons/position/z    {68368}
        step 0: 
          block  0: [null       ]
          block  1: [null       ]
          block  2: [null       ]
          block  3: [null       ]
          block  4: [null       ]
          block  5: [null       ]
          block  6: [null       ]
          block  7: [null       ]
          block  8: [null       ]
          block  9: [null       ]
          block 10: [34710:42094]
          block 11: [null       ]
          block 12: [50810:60201]
          block 13: [null       ]
          block 14: [null       ]
          block 15: [18040:26991]
          block 16: [26992:34709]
          block 17: [null       ]
          block 18: [null       ]
          block 19: [null       ]
          block 20: [null       ]
          block 21: [null       ]
          block 22: [null       ]
          block 23: [null       ]
          block 24: [null       ]
          block 25: [null       ]
          block 26: [null       ]
          block 27: [null       ]
          block 28: [null       ]
          block 29: [null       ]
          block 30: [null       ]
          block 31: [null       ]
          block 32: [null       ]
          block 33: [null       ]
          block 34: [null       ]
          block 35: [null       ]
          block 36: [null       ]
          block 37: [null       ]
          block 38: [null       ]
          block 39: [42095:50809]
          block 40: [null       ]
          block 41: [60202:68367]
          block 42: [ 9840:18039]
          block 43: [null       ]
          block 44: [null       ]
          block 45: [null       ]
          block 46: [null       ]
          block 47: [null       ]
          block 48: [null       ]
          block 49: [    0: 9839]
          block 50: [null       ]
          block 51: [null       ]
          block 52: [null       ]
          block 53: [null       ]
          block 54: [null       ]
          block 55: [null       ]
          block 56: [null       ]
...
  double    /data/10/particles/electrons_n/position/z  {8781}
        step 0: 
          block 0: [null     ]
          block 1: [8488:8780]
          block 2: [null     ]
          block 3: [null     ]
          block 4: [null     ]
          block 5: [   0:8487]
          block 6: [null     ]
          block 7: [null     ]
          block 8: [null     ]
...

Result

python3 -X faulthandler -c 'from openpmd_viewer.addons import LpaDiagnostics; ts_bp = LpaDiagnostics("diag1/"); print(ts_bp.get_particle(species="electrons", var_list=["charge", "id", "mass", "theta", "x", "y", "z", "ux", "uy", "uz", "w"], iteration=10))'

This still segfaults here:

Current thread 0x00007f8525bcfb80 (most recent call first):
  File "/global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py", line 80 in get_data
  File "/global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/particle_reader.py", line 73 in read_species_data
  File "/global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/data_reader.py", line 293 in read_species_data
  File "/global/homes/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/main.py", line 274 in get_particle
  File "<string>", line 1 in <module>

Calling into here:

#1  0x00007ffbba85cc38 in openPMD::Series::flushFileBased(std::_Rb_tree_iterator<std::pair<unsigned long const, openPMD::Iteration> >, std::_Rb_tree_iterator<std::pair<unsigned long const, openPMD::Iteration> >, openPMD::internal::FlushParams, bool) () from /global/u2/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_api/../../../libopenPMD.so
#2  0x00007ffbba85dc3c in openPMD::Series::flush_impl(std::_Rb_tree_iterator<std::pair<unsigned long const, openPMD::Iteration> >, std::_Rb_tree_iterator<std::pair<unsigned long const, openPMD::Iteration> >, openPMD::internal::FlushParams, bool) () from /global/u2/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_api/../../../libopenPMD.so
#3  0x00007ffbba85dd4e in openPMD::Series::flush(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
   from /global/u2/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_api/../../../libopenPMD.so
#4  0x00007ffbbabd28ef in ?? () from /global/u2/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so
#5  0x00007ffbbaae72d3 in ?? () from /global/u2/a/ahuebl/.conda/envs/perlmutter-postprocessing/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so
#6  0x00005555556946b7 in cfunction_call (func=0x7ffbbad1f650, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.10.10/Objects/methodobject.c:543
...

@franzpoeschel this might be us now? Gotta recompile w/ debug symbols.

Follow-Up

Can we nonetheless add a read test to ADIOS2 CI to make sure NULL-blocks will always work? I would like to cover:

ax3l commented 1 year ago

Backtrace with debug symbols in openPMD-api... might be in ADIOS2 after all.

Thread 1 "python3" received signal SIGABRT, Aborted.
0x00007ffff7c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007ffff7c969fc in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7c42476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff7c287f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff7c89676 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff7ca0cfc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007ffff7ca0fdc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007ffff7ca549a in free () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007fff4c57f904 in adios2::transportman::TransportMan::ReadFile(char*, unsigned long, unsigned long, unsigned long) () from /home/axel/micromamba/envs/warpx-openmp-dev/lib/./libadios2_core.so.2
#8  0x00007fff4c34d379 in void adios2::core::engine::BP4Reader::ReadVariableBlocks<unsigned long>(adios2::core::Variable<unsigned long>&) () from /home/axel/micromamba/envs/warpx-openmp-dev/lib/./libadios2_core.so.2
#9  0x00007fff4c34f6dc in adios2::core::engine::BP4Reader::PerformGets() () from /home/axel/micromamba/envs/warpx-openmp-dev/lib/./libadios2_core.so.2
#10 0x00007fff50470123 in operator() (ba=..., eng=..., __closure=<optimized out>) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:3540
#11 openPMD::detail::BufferedActions::flush_impl<openPMD::detail::BufferedActions::flush_impl(ADIOS2FlushParams, bool)::<lambda(openPMD::detail::BufferedActions&, adios2::Engine&)> > (flushParams=..., 
    flushUnconditionally=false, writeLatePuts=<optimized out>, performPutGets=..., this=0x55555713e8c0) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:3435
#12 openPMD::detail::BufferedActions::flush_impl (this=0x55555713e8c0, flushParams=..., writeLatePuts=<optimized out>) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:3526
#13 0x00007fff50499357 in openPMD::detail::BufferedActions::flush<openPMD::detail::BufferedActions::ADIOS2FlushParams&, bool> (this=0x55555713e8c0)
    at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:3319
#14 openPMD::ADIOS2IOHandlerImpl::flush (this=<optimized out>, flushParams=...) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:466
#15 0x00007fff504997f5 in openPMD::ADIOS2IOHandler::flush (this=<optimized out>, flushParams=...) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:3788
#16 0x00007fff503f72ec in operator() (__closure=<optimized out>) at /home/axel/src/openPMD/openPMD-api/src/IO/AbstractIOHandler.cpp:34
#17 openPMD::AbstractIOHandler::flush (this=0x5555577f2560, params=...) at /home/axel/src/openPMD/openPMD-api/src/IO/AbstractIOHandler.cpp:41
#18 0x00007fff5034e37f in openPMD::Series::flushFileBased (this=0x555557b00800, begin=..., end=..., flushParams=..., flushIOHandler=true) at /home/axel/src/openPMD/openPMD-api/src/Series.cpp:791
#19 0x00007fff5034f71c in openPMD::Series::flush_impl (this=0x555557b00800, begin=..., end=..., flushParams=..., flushIOHandler=<optimized out>) at /home/axel/src/openPMD/openPMD-api/src/Series.cpp:720
#20 0x00007fff5034f82e in openPMD::Series::flush (this=<optimized out>, backendConfig=...) at /home/axel/src/openPMD/openPMD-api/src/Series.cpp:401
#21 0x00007fff5078c4e9 in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, openPMD::Series, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg_v>(void (openPMD::Series::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg_v const&)::{lambda(openPMD::Series*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)#1}, void, openPMD::Series*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg_v>(pybind11::cpp_function::initialize<void, openPMD::Series, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg_v>(void (openPMD::Series::*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg_v const&)::{lambda(openPMD::Series*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)#1}&&, void (*)(openPMD::Series*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () at /home/axel/src/openPMD/openPMD-api/share/openPMD/thirdParty/pybind11/include/pybind11/pybind11.h:109
#22 0x00007fff506ae68e in pybind11::cpp_function::dispatcher (self=<optimized out>, args_in=0x7fff47751de0, kwargs_in=0x0)
    at /home/axel/src/openPMD/openPMD-api/share/openPMD/thirdParty/pybind11/include/pybind11/pybind11.h:929
#23 0x0000555555752c56 in cfunction_call (func=0x7fff508eac50, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.6/Objects/methodobject.c:542
#24 0x0000555555732bd3 in _PyObject_MakeTpCall (tstate=0x555555acc1f8 <_PyRuntime+166328>, callable=0x7fff508eac50, args=<optimized out>, nargs=1, keywords=0x0) at /usr/local/src/conda/python-3.11.6/Objects/call.c:214
#25 0x000055555573f463 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.6/Python/ceval.c:4760
#26 0x00005555557f5cbd in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb0020, tstate=0x555555acc1f8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.6/Include/internal/pycore_ceval.h:73
#27 _PyEval_Vector (tstate=0x555555acc1f8 <_PyRuntime+166328>, func=0x7ffff75da020, locals=<optimized out>, args=0x0, argcount=0, kwnames=0x0) at /usr/local/src/conda/python-3.11.6/Python/ceval.c:6425
#28 0x00005555557f534f in PyEval_EvalCode (co=0x7ffff75b8750, globals=<optimized out>, locals=0x7ffff75f33c0) at /usr/local/src/conda/python-3.11.6/Python/ceval.c:1140
#29 0x000055555581412a in run_eval_code_obj (tstate=0x555555acc1f8 <_PyRuntime+166328>, co=0x7ffff75b8750, globals=0x7ffff75f33c0, locals=0x7ffff75f33c0) at /usr/local/src/conda/python-3.11.6/Python/pythonrun.c:1710
--Type <RET> for more, q to quit, c to continue without paging--
#30 0x0000555555810433 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff75f33c0, locals=0x7ffff75f33c0, flags=<optimized out>, arena=<optimized out>)
    at /usr/local/src/conda/python-3.11.6/Python/pythonrun.c:1731
#31 0x0000555555804482 in PyRun_StringFlags (str=<optimized out>, start=<optimized out>, globals=0x7ffff75f33c0, locals=0x7ffff75f33c0, flags=0x7fffffffbea8) at /usr/local/src/conda/python-3.11.6/Python/pythonrun.c:1601
#32 0x000055555580435c in PyRun_SimpleStringFlags (
    command=0x7ffff75718c0 "from openpmd_viewer.addons import LpaDiagnostics; ts_bp = LpaDiagnostics(\"diag1/\"); print(ts_bp.get_particle(species=\"electrons\", var_list=[\"charge\", \"id\", \"mass\", \"theta\", \"x\", \"y\", \"z\", \"ux\", \"uy\", "..., flags=0x7fffffffbea8) at /usr/local/src/conda/python-3.11.6/Python/pythonrun.c:487
#33 0x000055555581ecef in pymain_run_command (command=<optimized out>) at /usr/local/src/conda/python-3.11.6/Modules/main.c:255
#34 pymain_run_python (exitcode=0x7fffffffbea4) at /usr/local/src/conda/python-3.11.6/Modules/main.c:592
#35 Py_RunMain () at /usr/local/src/conda/python-3.11.6/Modules/main.c:680
#36 0x00005555557e4007 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.6/Modules/main.c:734
#37 0x00007ffff7c29d90 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#38 0x00007ffff7c29e40 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#39 0x00005555557e3ead in _start ()

Oh... do I need to test this with ADIOS2 post-2.9.1-master...? got it...

vicentebolea commented 1 year ago

Can we nonetheless add a read test to ADIOS2 CI to make sure NULL-blocks will always work? I would like to cover:

BP4 and BP5 With and without compressors (ideally blosc and/or mgard)

There is a ticket for this at https://github.com/ornladios/ADIOS2/issues/3792

ax3l commented 1 year ago

Thanks.

Testing again with ADIOS2 v2.9.1-379-g14a0a3a8a on the above data set.

Still a segfault, but I think it might be openPMD-api internal now :tada:

As of openPMD-api 0.15.2:

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007ffff7d99f79 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007ffff7d99f79 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fffc374ea0c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const ()
   from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007fffc1a5afb6 in std::operator< <char, std::char_traits<char>, std::allocator<char> > (__lhs=<error: Cannot access memory at address 0x700d365c8>, __rhs="uint64_t") at /usr/include/c++/11/bits/basic_string.h:6343
#3  0x00007fffc1a5aa0d in std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::operator() (
    this=0x7fffc2118ba0 <openPMD::detail::fromADIOS2Type(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)::map>, __x=<error: Cannot access memory at address 0x700d365c8>, 
    __y="uint64_t") at /usr/include/c++/11/bits/stl_function.h:400
#4  0x00007fffc1e931ff in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openPMD::Datatype>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openPMD::Datatype> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openPMD::Datatype> > >::_M_lower_bound (
    this=0x7fffc2118ba0 <openPMD::detail::fromADIOS2Type(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)::map>, __x=0x5555571ebea0, 
    __y=0x7fffc2118ba8 <openPMD::detail::fromADIOS2Type(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)::map+8>, __k="uint64_t") at /usr/include/c++/11/bits/stl_tree.h:1905
#5  0x00007fffc1e8d94a in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openPMD::Datatype>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openPMD::Datatype> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openPMD::Datatype> > >::find (
    this=0x7fffc2118ba0 <openPMD::detail::fromADIOS2Type(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)::map>, __k="uint64_t") at /usr/include/c++/11/bits/stl_tree.h:2523
#6  0x00007fffc1e8d051 in std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, openPMD::Datatype, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openPMD::Datatype> > >::find (
    this=0x7fffc2118ba0 <openPMD::detail::fromADIOS2Type(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)::map>, __x="uint64_t") at /usr/include/c++/11/bits/stl_map.h:1170
#7  0x00007fffc1e8c2d2 in openPMD::detail::fromADIOS2Type (dt="uint64_t", verbose=false) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2Auxiliary.cpp:103
#8  0x00007fffc1e41757 in openPMD::ADIOS2IOHandlerImpl::verifyDataset<unsigned long> (this=0x5555575542e0, offset=std::vector of length 1, capacity 1 = {...}, extent=std::vector of length 1, capacity 1 = {...}, IO=..., 
    varName="/data/10/particles/electrons/id") at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:1569
#9  0x00007fffc1e22076 in openPMD::detail::DatasetReader::call<unsigned long> (impl=0x5555575542e0, bp=..., IO=..., engine=..., fileName="diag1/openpmd_000010.bp")
    at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:1611
#10 0x00007fffc1de8631 in openPMD::switchAdios2VariableType<openPMD::detail::DatasetReader, openPMD::ADIOS2IOHandlerImpl*&, openPMD::detail::BufferedGet&, adios2::IO&, adios2::Engine&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (dt=openPMD::Datatype::ULONG) at /home/axel/src/openPMD/openPMD-api/include/openPMD/IO/ADIOS/ADIOS2Auxiliary.hpp:248
#11 0x00007fffc1dd0031 in openPMD::detail::BufferedGet::run (this=0x5555575a8360, ba=...) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:2457
#12 0x00007fffc1dd73e6 in openPMD::detail::BufferedActions::flush_impl<openPMD::detail::BufferedActions::flush_impl(openPMD::detail::BufferedActions::ADIOS2FlushParams, bool)::<lambda(openPMD::detail::BufferedActions&, adios2::Engine&)> >(openPMD::detail::BufferedActions::ADIOS2FlushParams, struct {...} &&, bool, bool) (this=0x55555757b600, flushParams=..., performPutGets=..., writeLatePuts=false, flushUnconditionally=false)
    at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:3400
#13 0x00007fffc1dd5465 in openPMD::detail::BufferedActions::flush_impl (this=0x55555757b600, flushParams=..., writeLatePuts=false) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:3526
#14 0x00007fffc1de0a08 in openPMD::detail::BufferedActions::flush<openPMD::detail::BufferedActions::ADIOS2FlushParams&, bool> (this=0x55555757b600)
    at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:3323
#15 0x00007fffc1dc7c8f in openPMD::ADIOS2IOHandlerImpl::flush (this=0x5555575542e0, flushParams=...) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:466
#16 0x00007fffc1dd6d27 in openPMD::ADIOS2IOHandler::flush (this=0x555557554250, flushParams=...) at /home/axel/src/openPMD/openPMD-api/src/IO/ADIOS/ADIOS2IOHandler.cpp:3788
#17 0x00007fffc1d83c33 in operator() (__closure=0x7fffffffc7a0) at /home/axel/src/openPMD/openPMD-api/src/IO/AbstractIOHandler.cpp:34
#18 0x00007fffc1d83cf1 in openPMD::AbstractIOHandler::flush (this=0x555557554250, params=...) at /home/axel/src/openPMD/openPMD-api/src/IO/AbstractIOHandler.cpp:41
#19 0x00007fffc1beb38f in openPMD::Series::flushFileBased (this=0x5555575033a0, 
      begin={first = 10, second = {<openPMD::Attributable> = {_vptr.Attributable = 0x7fffc20c02e8 <vtable for openPMD::Iteration+16>, m_attri = std::shared_ptr<openPMD::internal::AttributableData> (use count 4, weak count 0) = {--Type <RET> for more, q to quit, c to continue without paging--

In:

with a bit printf debugging:

...
fromADIOS2Type: double
fromADIOS2Type: uint64_t
<__array_function__ internals>:200: RuntimeWarning: invalid value encountered in cast
fromADIOS2Type: uint64_t
[AbstractIOHandlerImpl] IO Task READ_DATASET failed with exception. Clearing IO queue and passing on the exception.
Segmentation fault (core dumped)

Removing the try-catch there leads to the first error in read dataset to show up here:

franzpoeschel commented 1 year ago
#3  0x00007fffc1a5aa0d in std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::operator() (
    this=0x7fffc2118ba0 <openPMD::detail::fromADIOS2Type(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)::map>, __x=<error: Cannot access memory at address 0x700d365c8>, 
    __y="uint64_t") at /usr/include/c++/11/bits/stl_function.h:400

Value x which comes from the std::map seems to be broken in ADIOS2Auxiliary.cpp. Can you check if changing static std::map <std::string, Datatype> map to std::map<std::string, Datatype> const map makes a difference? Ideally, we can at least use thread_local std::map<std::string, Datatype> const map since this is a function that is called very often, so I'd like avoiding to construct the std::map in each invocation.

pnorbert commented 9 months ago

Anyone has any update on this? Is this still an issue with the master branch?