mala-project / mala

Materials Learning Algorithms. A framework for machine learning materials properties from first-principles data.
https://mala-project.github.io/mala/
BSD 3-Clause "New" or "Revised" License
82 stars 26 forks source link

OpenPMD interface fails after LAMMPS usage #476

Open RandomDefaultUser opened 1 year ago

RandomDefaultUser commented 1 year ago

When investigating a problem with the test pipeline, I stumbled upon the fact that attempting an OpenPMD write after LAMMPS has been used in any capacity will result in crash. A MWE to reproduce this problem (assuming the model from the basic examples is present) is:

import os
from ase.io import read
import mala
from mala.datahandling.data_repo import data_repo_path
data_path = os.path.join(data_repo_path, "Be2")

# Trigger LAMMPS by performing inference on an atomic snapshot.
parameters, network, data_handler, predictor = mala.Predictor.\
    load_run("be_model", path="basic")
atoms = read(os.path.join(data_path, "Be_snapshot3.out"))
ldos = predictor.predict_for_atoms(atoms)
ldos_calculator: mala.LDOS = predictor.target_calculator
ldos_calculator.read_from_array(ldos)

# Test OpenPMD.
params = mala.Parameters()
ldos_calculator = mala.LDOS. \
    from_numpy_file(params,
                    os.path.join(data_path,
                                 "Be_snapshot1.out.npy"))
ldos_calculator. \
    read_additional_calculation_data(os.path.join(data_path,
                                                  "Be_snapshot1.out"),
                                     "espresso-out")

# Write and then read in via OpenPMD and make sure all the info is
# retained.
ldos_calculator.write_to_openpmd_file("test_openpmd.h5",
                                      ldos_calculator.
                                      local_density_of_states)

This results in

free(): invalid pointer

python3.10:19539 terminated with signal 6 at PC=7f51bb34aa7c SP=7ffff1011760.  Backtrace:
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f51bb34aa7c]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f51bb2f6476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f51bb2dc7f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x896f6)[0x7f51bb33d6f6]
/lib/x86_64-linux-gnu/libc.so.6(+0xa0d7c)[0x7f51bb354d7c]
/lib/x86_64-linux-gnu/libc.so.6(+0xa2ac4)[0x7f51bb356ac4]
/lib/x86_64-linux-gnu/libc.so.6(free+0x73)[0x7f51bb3594d3]
/home/fiedlerl/.local/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so(_ZNSt8__detail9_CompilerISt12regex_traitsIcEEC2EPKcS5_RKSt6localeNSt15regex_constants18syntax_option_typeE+0x735)[0x7f51a0852805]
/home/fiedlerl/.local/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so(_ZN7openPMD6Series10parseInputESs+0x2f9)[0x7f51a082b7b9]
/home/fiedlerl/.local/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so(_ZN7openPMD6SeriesC1ERKSsNS_6AccessES2_+0x28e)[0x7f51a083573e]
/home/fiedlerl/.local/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so(+0x47fb2f)[0x7f51a07c3b2f]
/home/fiedlerl/.local/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so(+0x3a14e3)[0x7f51a06e54e3]
/usr/bin/python3.10(+0x15b10e)[0x5561bace810e]
/usr/bin/python3.10(_PyObject_MakeTpCall+0x25b)[0x5561bacdebbb]
/usr/bin/python3.10(+0x1692cb)[0x5561bacf62cb]
/usr/bin/python3.10(_PyObject_Call+0x118)[0x5561bacf6e48]
/usr/bin/python3.10(+0x165b4b)[0x5561bacf2b4b]
/usr/bin/python3.10(+0x151f6b)[0x5561bacdef6b]
/home/fiedlerl/.local/lib/python3.10/site-packages/scipy/spatial/_distance_pybind.cpython-310-x86_64-linux-gnu.so(+0x2db5b)[0x7f5116929b5b]
/usr/bin/python3.10(_PyObject_MakeTpCall+0x25b)[0x5561bacdebbb]
/usr/bin/python3.10(_PyEval_EvalFrameDefault+0x75eb)[0x5561bacd802b]
/usr/bin/python3.10(+0x168ff1)[0x5561bacf5ff1]
/usr/bin/python3.10(_PyEval_EvalFrameDefault+0x19aa)[0x5561bacd23ea]
/usr/bin/python3.10(_PyFunction_Vectorcall+0x7c)[0x5561bace895c]
/usr/bin/python3.10(_PyEval_EvalFrameDefault+0x809)[0x5561bacd1249]
/usr/bin/python3.10(+0x140956)[0x5561baccd956]
/usr/bin/python3.10(PyEval_EvalCode+0x86)[0x5561badc1906]
/usr/bin/python3.10(+0x261b88)[0x5561badeeb88]
/usr/bin/python3.10(+0x25a86b)[0x5561bade786b]
/usr/bin/python3.10(+0x2618d5)[0x5561badee8d5]
/usr/bin/python3.10(_PyRun_SimpleFileObject+0x1a8)[0x5561badeddb8]
/usr/bin/python3.10(_PyRun_AnyFileObject+0x43)[0x5561badedab3]
/usr/bin/python3.10(Py_RunMain+0x2be)[0x5561badde5ee]
/usr/bin/python3.10(Py_BytesMain+0x2d)[0x5561badb48dd]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f51bb2ddd90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f51bb2dde40]
/usr/bin/python3.10(_start+0x25)[0x5561badb47d5]

For good measure one may through in a mala.finalize() before the OpenPMD part, which calls the lammps.finalize() function - but this does not affect the error in any way.

franzpoeschel commented 1 year ago

Sounds interesting, I'll have a look

franzpoeschel commented 1 year ago

Hey Lenz @RandomDefaultUser,

this file is not part of the default sample data, right? If so, can you share it with me somehow, please?

# Trigger LAMMPS by performing inference on an atomic snapshot.
parameters, network, data_handler, predictor = mala.Predictor.\
    load_run("be_model", path="basic")
franzpoeschel commented 1 year ago

I replaced that line with something else from the test that looked similar, resulting in:

#!/usr/bin/env python

import os
from ase.io import read
import mala
from mala.datahandling.data_repo import data_repo_path

data_path = os.path.join(data_repo_path, "Be2")

# Trigger LAMMPS by performing inference on an atomic snapshot.
parameters, network, data_handler, predictor = mala.Predictor.load_run(
    "workflow_test", path=os.path.join(data_repo_path, "workflow_test")
)
parameters.targets.target_type = "LDOS"
parameters.targets.ldos_gridsize = 11
parameters.targets.ldos_gridspacing_ev = 2.5
parameters.targets.ldos_gridoffset_ev = -5
parameters.running.inference_data_grid = [18, 18, 27]
parameters.descriptors.descriptor_type = "Bispectrum"
parameters.descriptors.bispectrum_twojmax = 10
parameters.descriptors.bispectrum_cutoff = 4.67637
parameters.targets.pseudopotential_path = data_path

predicted_ldos = predictor. \
    predict_from_qeout(os.path.join(data_path,
                                    "Be_snapshot3.out"))

ldos_calculator: mala.LDOS
ldos_calculator = data_handler.target_calculator
ldos_calculator. \
    read_additional_calculation_data(os.path.join(data_path,
                                                    "Be_snapshot3.out"),
                                        "espresso-out")
ldos_calculator.read_from_array(predicted_ldos)
# total_energy_traditional = ldos_calculator.total_energy
# parameters.descriptors.use_atomic_density_energy_formula = True
ldos_calculator.read_from_array(predicted_ldos)

# Test OpenPMD.
params = mala.Parameters()
ldos_calculator = mala.LDOS.from_numpy_file(
    params, os.path.join(data_path, "Be_snapshot1.out.npy")
)
ldos_calculator.read_additional_calculation_data(
    os.path.join(data_path, "Be_snapshot1.out"), "espresso-out"
)

# Write and then read in via OpenPMD and make sure all the info is
# retained.
ldos_calculator.write_to_openpmd_file(
    "test_openpmd.h5", ldos_calculator.local_density_of_states
)

This runs without problems for me.

A bug like this might depend on the specific setup that you are using, can you please tell me:

  1. The version of openPMD-api and how you installed it
  2. The version of Lammps (Lammps is probably a manual installation.. ?)
  3. The version of Mala where you experience the bug
franzpoeschel commented 1 year ago

What's a bit weird: According to your backtrace, the error occurs very early during construction of the Series object, before any IO access is made. Apparently, the error occurs inside the C++ standard library during compilation of a Regex that we use for parsing:

/home/fiedlerl/.local/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so(std::__detail::_Compiler<std::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type)+0x735)[0x7f51a0852805]
franzpoeschel commented 1 year ago

Some weirdness seems to be going on in the linker, for some reason the openPMD shared library resolves C++ STL symbols in the Lammps shared library. When compiling a GPU-aware Lammps, this is likely to lead to ABI incompatibilities.

#0  0x00007ffe10eb0cd0 in __cxa_throw () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#1  0x00007ffe0f7e3365 in __cxa_bad_cast () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#2  0x00007ffe10edbf00 in std::__cxx11::collate<char> const& std::use_facet<std::__cxx11::collate<char> >(std::locale const&) () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#3  0x00007ffe10e7f97c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::__cxx11::regex_traits<char>::transform<char*>(char*, char*) const () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#4  0x00007ffe10e7f718 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::__cxx11::regex_traits<char>::transform_primary<char const*>(char const*, char const*) const ()
   from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#5  0x00007ffe10e7f58f in std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, false>::_M_apply(char, std::integral_constant<bool, false>) const::{lambda()#1}::operator()() const () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#6  0x00007ffe10e7eb1b in std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, false>::_M_ready() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#7  0x00007ffe10e8216d in void std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_insert_bracket_matcher<false, false>(bool) () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#8  0x00007ffe10e7df78 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_bracket_expression() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#9  0x00007ffe10e7a756 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#10 0x00007ffe10e79b0b in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#11 0x00007ffe10e79bba in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#12 0x00007ffe10e779f4 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#13 0x00007ffe10e7a9fc in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#14 0x00007ffe10e79b0b in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#15 0x00007ffe10e79bba in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#16 0x00007ffe10e79bba in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#17 0x00007ffe10e779f4 in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() () from /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so
#18 0x00007ffe0a15e9dd in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) ()
   from /nix/store/f0vy6p4m96j21s9fg2ywd28d5d3wdini-python3.10-openPMD-api-0.15.1/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so
#19 0x00007ffe0a133f7f in openPMD::Series::parseInput(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
   from /nix/store/f0vy6p4m96j21s9fg2ywd28d5d3wdini-python3.10-openPMD-api-0.15.1/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so
#20 0x00007ffe0a13da44 in openPMD::Series::Series(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, openPMD::Access, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /nix/store/f0vy6p4m96j21s9fg2ywd28d5d3wdini-python3.10-openPMD-api-0.15.1/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so
#21 0x00007ffe0a0d03b8 in ?? () from /nix/store/f0vy6p4m96j21s9fg2ywd28d5d3wdini-python3.10-openPMD-api-0.15.1/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so
#22 0x00007ffe09fdd5b0 in ?? () from /nix/store/f0vy6p4m96j21s9fg2ywd28d5d3wdini-python3.10-openPMD-api-0.15.1/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so
#23 0x00007ffff7cf3f63 in cfunction_call () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#24 0x00007ffff7c87c84 in _PyObject_MakeTpCall () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#25 0x00007ffff7ce9d12 in method_vectorcall () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#26 0x00007ffff7c99a08 in PyVectorcall_Call () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#27 0x00007ffff7d391c2 in slot_tp_init () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#28 0x00007ffff7cea1c7 in type_call () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#29 0x00007ffe2b7f8fb7 in pybind11_meta_call () from /nix/store/15nzi2f67rg8nbxlgdws68kcjyqgnhlg-python3.10-torch-1.12.1/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#30 0x00007ffff7c87c84 in _PyObject_MakeTpCall () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#31 0x00007ffff7c39f69 in _PyEval_EvalFrameDefault () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#32 0x00007ffff7db327f in _PyEval_Vector () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#33 0x00007ffff7ce9cd8 in method_vectorcall () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#34 0x00007ffff7c38344 in _PyEval_EvalFrameDefault () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#35 0x00007ffff7db327f in _PyEval_Vector () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#36 0x00007ffff7c395b9 in _PyEval_EvalFrameDefault () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#37 0x00007ffff7db327f in _PyEval_Vector () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#38 0x00007ffff7db38e8 in PyEval_EvalCode () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#39 0x00007ffff7e3aa9d in run_mod () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#40 0x00007ffff7e47572 in _PyRun_SimpleFileObject () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#41 0x00007ffff7e47b4b in _PyRun_AnyFileObject () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#42 0x00007ffff7e4bd0f in Py_RunMain () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#43 0x00007ffff7e4c535 in Py_BytesMain () from /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0
#44 0x00007ffff78af24e in __libc_start_call_main () from /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libc.so.6

I honestly have no idea how this even happens. For now, a workaround is just adding import openpmd_api at the start of the file, so the linker knows about openPMD from the start. I'll try to figure out how this happened.

RandomDefaultUser commented 1 year ago

Thanks for the investigation! I am glad that the error is reproducible, that helps a lot. At least now we know where to look...

ax3l commented 1 year ago

Hi, is it possible that some components (lammps, openPMD-api) are not built with the same compilers / stdlibs?

I see that lammps.so was built with nix while openPMD-api came from which source? Can you try building both with the same toolchain?

I suspect that something in the lammps build exposes or overwrites symbols of the stdlib or some other incompatibility in build toolchains is going on.

franzpoeschel commented 1 year ago

Thank you for looking at this, Axel!

I built both openPMD and Lammps with Nix and their dependencies should be compatible.

The dynamically linked dependencies are:

> ldd /nix/store/f0vy6p4m96j21s9fg2ywd28d5d3wdini-python3.10-openPMD-api-0.15.1/lib/python3.10/site-packages/openpmd_api/openpmd_api_cxx.cpython-310-x86_64-linux-gnu.so | sort                                     
        /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib64/ld-linux-x86-64.so.2 (0x00007ffff7fc6000)
        libadios2_atl.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_atl.so.2 (0x00007ffff5855000)
        libadios2_core.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_core.so.2 (0x00007ffff63ae000)
        libadios2_core_mpi.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_core_mpi.so.2 (0x00007ffff6b5f000)
        libadios2_cxx11.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_cxx11.so.2 (0x00007ffff7434000)
        libadios2_cxx11_mpi.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_cxx11_mpi.so.2 (0x00007ffff75c2000)
        libadios2_dill.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_dill.so.2 (0x00007ffff5100000)
        libadios2_evpath.so => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_evpath.so (0x00007ffff58cd000)
        libadios2_ffs.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_ffs.so.2 (0x00007ffff5864000)
        libadios2_perfstubs.so => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_perfstubs.so (0x00007ffff5b01000)
        libatomic.so.1 => /nix/store/b13h86pg7lbf6vpc1vwzw6akmakyw1bs-gcc-11.3.0-lib/lib/libatomic.so.1 (0x00007ffff5153000)
        libbfd-2.39.so => /nix/store/7c8vx9wngib658cfx5pnnfi370a37ppm-libbfd-2.39/lib/libbfd-2.39.so (0x00007ffff5160000)
        libblosc2.so.2 => /nix/store/nagq9kg0b6m2yrxn30v15pz5sa44w3f1-blosc2-v2.4.3/lib/libblosc2.so.2 (0x00007ffff595c000)
        libbz2.so.1 => /nix/store/61rpfcaxhyqfmnk5qp4z7hf20wh9zgrk-bzip2-1.0.8/lib/libbz2.so.1 (0x00007ffff5947000)
        libc.so.6 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libc.so.6 (0x00007ffff6be6000)
        libdl.so.2 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libdl.so.2 (0x00007ffff6bdf000)
        libevent_core-2.1.so.7 => /nix/store/icmm0jx9al1dhr60fh4mmvi5sqxl6wh9-libevent-2.1.12/lib/libevent_core-2.1.so.7 (0x00007ffff5b0e000)
        libevent_pthreads-2.1.so.7 => /nix/store/icmm0jx9al1dhr60fh4mmvi5sqxl6wh9-libevent-2.1.12/lib/libevent_pthreads-2.1.so.7 (0x00007ffff5b07000)
        libfabric.so.1 => /nix/store/jv6kda0z8m9kw5kvs8inhdgxwasp431f-libfabric-1.15.1/lib/libfabric.so.1 (0x00007ffff5daf000)
        libgcc_s.so.1 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libgcc_s.so.1 (0x00007ffff6def000)
        libhdf5.so.100.1.0 => /nix/store/skqp7rnc98qyslxg8231s8yhg4p8483w-hdf5-1.10.1/lib/libhdf5.so.100.1.0 (0x00007ffff75cb000)
        libhwloc.so.15 => /nix/store/jwbh8kj703ns9p7cdcsxg2kl1ggaw7va-hwloc-2.8.0-lib/lib/libhwloc.so.15 (0x00007ffff5b45000)
        libibverbs.so.1 => /nix/store/bl6qfz0vqf4l9zd3hx0y29v7rvym6b8p-rdma-core-43.0/lib/libibverbs.so.1 (0x00007ffff5d6e000)
        libm.so.6 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libm.so.6 (0x00007ffff6e09000)
        libmpi.so.40 => /nix/store/zidndx02ksdqv2szkwgxymb42s5gimfj-openmpi-4.1.4/lib/libmpi.so.40 (0x00007ffff70ff000)
        libnl-3.so.200 => /nix/store/i5k5d396psw59zvgmy9r6qzmsckgz2vh-libnl-3.7.0/lib/libnl-3.so.200 (0x00007ffff5c54000)
        libnl-route-3.so.200 => /nix/store/i5k5d396psw59zvgmy9r6qzmsckgz2vh-libnl-3.7.0/lib/libnl-route-3.so.200 (0x00007ffff5bc1000)
        libnuma.so.1 => /nix/store/94kqdwqz1qdlcv5y07hsrs0z1a5dgqpd-numactl-2.0.16/lib/libnuma.so.1 (0x00007ffff5840000)
        libopen-pal.so.40 => /nix/store/zidndx02ksdqv2szkwgxymb42s5gimfj-openmpi-4.1.4/lib/libopen-pal.so.40 (0x00007ffff60d6000)
        libopen-rte.so.40 => /nix/store/zidndx02ksdqv2szkwgxymb42s5gimfj-openmpi-4.1.4/lib/libopen-rte.so.40 (0x00007ffff621a000)
        libpmix.so.2 => /nix/store/f80qm7xlg6q4rh9hd35rxll6vhxk3qvb-pmix-3.2.3/lib/libpmix.so.2 (0x00007ffff5c78000)
        libpsm2.so.2 => /nix/store/9hj5fhj0fpfxcsiyyh36c1jz2bh6ab2p-libpsm2-11.2.229/lib/libpsm2.so.2 (0x00007ffff6342000)
        libpthread.so.0 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libpthread.so.0 (0x00007ffff583b000)
        librdmacm.so.1 => /nix/store/bl6qfz0vqf4l9zd3hx0y29v7rvym6b8p-rdma-core-43.0/lib/librdmacm.so.1 (0x00007ffff5d8f000)
        librt.so.1 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/librt.so.1 (0x00007ffff584e000)
        libstdc++.so.6 => /nix/store/b13h86pg7lbf6vpc1vwzw6akmakyw1bs-gcc-11.3.0-lib/lib/libstdc++.so.6 (0x00007ffff6ee9000)
        libucm.so.0 => /nix/store/mzfrxasizd3i38w02sa6i7xd8gd5r2i4-ucx-1.13.1/lib/libucm.so.0 (0x00007ffff5f2e000)
        libucp.so.0 => /nix/store/mzfrxasizd3i38w02sa6i7xd8gd5r2i4-ucx-1.13.1/lib/libucp.so.0 (0x00007ffff5f97000)
        libucs.so.0 => /nix/store/mzfrxasizd3i38w02sa6i7xd8gd5r2i4-ucx-1.13.1/lib/libucs.so.0 (0x00007ffff5ec1000)
        libuct.so.0 => /nix/store/mzfrxasizd3i38w02sa6i7xd8gd5r2i4-ucx-1.13.1/lib/libuct.so.0 (0x00007ffff5f4e000)
        libz.so.1 => /nix/store/fblaj5ywkgphzpp5kx41av32kls9256y-zlib-1.2.13/lib/libz.so.1 (0x00007ffff5ba3000)
        linux-vdso.so.1 (0x00007ffff7fc5000)

> ldd /nix/store/maynhrzavj8xzyxrr7i22xf47jxq22g8-Lammps-8Feb2023/lib/liblammps.so | sort             
        /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib64/ld-linux-x86-64.so.2 (0x00007ffff7fc6000)
        libatomic.so.1 => /nix/store/b13h86pg7lbf6vpc1vwzw6akmakyw1bs-gcc-11.3.0-lib/lib/libatomic.so.1 (0x00007fffedfbc000)
        libbfd-2.39.so => /nix/store/7c8vx9wngib658cfx5pnnfi370a37ppm-libbfd-2.39/lib/libbfd-2.39.so (0x00007fffedfc9000)
        libc.so.6 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libc.so.6 (0x00007fffeef17000)
        libcrypt.so.1 => /nix/store/9r9v2agfvn1zaifqjwyi9db67p48z0sd-libxcrypt-4.4.30/lib/libcrypt.so.1 (0x00007fffef4ec000)
        libcuda.so.1 => /.singularity.d/libs/libcuda.so.1 (0x00007fffef54d000)
        libcudart.so.11.0 => /nix/store/cfwcn5kvvcg2j13hvf9cv7siwvkjgvni-cudatoolkit-11.7.0-lib/lib/libcudart.so.11.0 (0x00007fffef200000)
        libdl.so.2 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libdl.so.2 (0x00007fffef548000)
        libevent_core-2.1.so.7 => /nix/store/icmm0jx9al1dhr60fh4mmvi5sqxl6wh9-libevent-2.1.12/lib/libevent_core-2.1.so.7 (0x00007fffee6b9000)
        libevent_pthreads-2.1.so.7 => /nix/store/icmm0jx9al1dhr60fh4mmvi5sqxl6wh9-libevent-2.1.12/lib/libevent_pthreads-2.1.so.7 (0x00007fffee6b2000)
        libfabric.so.1 => /nix/store/jv6kda0z8m9kw5kvs8inhdgxwasp431f-libfabric-1.15.1/lib/libfabric.so.1 (0x00007fffee93a000)
        libgcc_s.so.1 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libgcc_s.so.1 (0x00007fffef527000)
        libhwloc.so.15 => /nix/store/jwbh8kj703ns9p7cdcsxg2kl1ggaw7va-hwloc-2.8.0-lib/lib/libhwloc.so.15 (0x00007fffee6f0000)
        libibverbs.so.1 => /nix/store/bl6qfz0vqf4l9zd3hx0y29v7rvym6b8p-rdma-core-43.0/lib/libibverbs.so.1 (0x00007fffee919000)
        libm.so.6 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libm.so.6 (0x00007fffef120000)
        libmpi.so.40 => /nix/store/zidndx02ksdqv2szkwgxymb42s5gimfj-openmpi-4.1.4/lib/libmpi.so.40 (0x00007ffff131b000)
        libnl-3.so.200 => /nix/store/i5k5d396psw59zvgmy9r6qzmsckgz2vh-libnl-3.7.0/lib/libnl-3.so.200 (0x00007fffee7ff000)
        libnl-route-3.so.200 => /nix/store/i5k5d396psw59zvgmy9r6qzmsckgz2vh-libnl-3.7.0/lib/libnl-route-3.so.200 (0x00007fffee76c000)
        libnuma.so.1 => /nix/store/94kqdwqz1qdlcv5y07hsrs0z1a5dgqpd-numactl-2.0.16/lib/libnuma.so.1 (0x00007fffee6a4000)
        libomp.so => /nix/store/srddjzm4hdvyiw0k7il4j65mimcfs4a4-openmp-11.1.0/lib/libomp.so (0x00007ffff1235000)
        libopen-pal.so.40 => /nix/store/zidndx02ksdqv2szkwgxymb42s5gimfj-openmpi-4.1.4/lib/libopen-pal.so.40 (0x00007fffeec41000)
        libopen-rte.so.40 => /nix/store/zidndx02ksdqv2szkwgxymb42s5gimfj-openmpi-4.1.4/lib/libopen-rte.so.40 (0x00007fffeed85000)
        libpmix.so.2 => /nix/store/f80qm7xlg6q4rh9hd35rxll6vhxk3qvb-pmix-3.2.3/lib/libpmix.so.2 (0x00007fffee825000)
        libpsm2.so.2 => /nix/store/9hj5fhj0fpfxcsiyyh36c1jz2bh6ab2p-libpsm2-11.2.229/lib/libpsm2.so.2 (0x00007fffeeead000)
        libpthread.so.0 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/libpthread.so.0 (0x00007ffff122e000)
        libpython3.10.so.1.0 => /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0 (0x00007ffff164e000)
        librdmacm.so.1 => /nix/store/bl6qfz0vqf4l9zd3hx0y29v7rvym6b8p-rdma-core-43.0/lib/librdmacm.so.1 (0x00007fffef4aa000)
        librt.so.1 => /nix/store/9xfad3b5z4y00mzmk2wnn4900q0qmxns-glibc-2.35-224/lib/librt.so.1 (0x00007fffef543000)
        libucm.so.0 => /nix/store/mzfrxasizd3i38w02sa6i7xd8gd5r2i4-ucx-1.13.1/lib/libucm.so.0 (0x00007fffef4ca000)
        libucp.so.0 => /nix/store/mzfrxasizd3i38w02sa6i7xd8gd5r2i4-ucx-1.13.1/lib/libucp.so.0 (0x00007fffeeb02000)
        libucs.so.0 => /nix/store/mzfrxasizd3i38w02sa6i7xd8gd5r2i4-ucx-1.13.1/lib/libucs.so.0 (0x00007fffeea4c000)
        libuct.so.0 => /nix/store/mzfrxasizd3i38w02sa6i7xd8gd5r2i4-ucx-1.13.1/lib/libuct.so.0 (0x00007fffeeab9000)
        libz.so.1 => /nix/store/fblaj5ywkgphzpp5kx41av32kls9256y-zlib-1.2.13/lib/libz.so.1 (0x00007fffee74e000)
        linux-vdso.so.1 (0x00007ffff7fc5000)

Here is the diff of both shared objects' dependencies:

>       libadios2_atl.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_atl.so.2 
>       libadios2_core.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_core.so.2 
>       libadios2_core_mpi.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_core_mpi.so.2 
>       libadios2_cxx11.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_cxx11.so.2 
>       libadios2_cxx11_mpi.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_cxx11_mpi.so.2 
>       libadios2_dill.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_dill.so.2 
>       libadios2_evpath.so => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_evpath.so 
>       libadios2_ffs.so.2 => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_ffs.so.2 
>       libadios2_perfstubs.so => /nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib//../../../..//nix/store/4jpc9p41sca0l244bhq83icgjwyjd964-adios2-v2.9.0/lib/libadios2_perfstubs.so 
3a13,14
>       libblosc2.so.2 => /nix/store/nagq9kg0b6m2yrxn30v15pz5sa44w3f1-blosc2-v2.4.3/lib/libblosc2.so.2 
>       libbz2.so.1 => /nix/store/61rpfcaxhyqfmnk5qp4z7hf20wh9zgrk-bzip2-1.0.8/lib/libbz2.so.1 
5,7d15
<       libcrypt.so.1 => /nix/store/9r9v2agfvn1zaifqjwyi9db67p48z0sd-libxcrypt-4.4.30/lib/libcrypt.so.1 
<       libcuda.so.1 => /.singularity.d/libs/libcuda.so.1 
<       libcudart.so.11.0 => /nix/store/cfwcn5kvvcg2j13hvf9cv7siwvkjgvni-cudatoolkit-11.7.0-lib/lib/libcudart.so.11.0 
12a21
>       libhdf5.so.100.1.0 => /nix/store/skqp7rnc98qyslxg8231s8yhg4p8483w-hdf5-1.10.1/lib/libhdf5.so.100.1.0 
20d28
<       libomp.so => /nix/store/srddjzm4hdvyiw0k7il4j65mimcfs4a4-openmp-11.1.0/lib/libomp.so 
26d33
<       libpython3.10.so.1.0 => /nix/store/5axq6aw8j3vcs2m7gi440cwpcckl7ql9-python3-3.10.9/lib/libpython3.10.so.1.0 
28a36
>       libstdc++.so.6 => /nix/store/b13h86pg7lbf6vpc1vwzw6akmakyw1bs-gcc-11.3.0-lib/lib/libstdc++.so.6 

However, Lammps is built with nvcc+gcc11.3.0 while openPMD is directly built with gcc11.3.0.

What seems weird is that Lammps does not link to libstdc++.so.6 at all, but somehow still carries its symbols.

I suspect that something in the lammps build exposes or overwrites symbols of the stdlib or some other incompatibility in build toolchains is going on.

So we should probably ask the Lammps developers if their code does anything that could be causing this?

franzpoeschel commented 10 months ago

I have tried looking into this once more, and I think that I have found out what caused the issue on my end. Since the symptoms on your end seem to be the same, it's likely that we're looking at the same thing here.

In the failing build environment, I had built Lammps with NVCC, but my Kokkos build was with Clang (I had had issues with a gcc build and picked Clang as an alternative). So, openPMD-api and Lammps were referring to two different C++ standard libraries that are ABI-incompatible, but use the same symbols. Since one symbol cannot exist twice in the same application context, whoever loads his symbols first, gets the first shot. Hence the error being suppressible by adding an early import openpmd_api.

I tested my environment from back then again and can still reproduce the issue. After setting up a new environment that builds Kokkos and openPMD-api both with the same software stack (gcc+nvcc / gcc), the script runs fine without an error.

TLDR: This is likely not a bug, but rather a wrong software environment with incompatible dependencies. Do you still know how you had set up your environment for this bug to occur? @RandomDefaultUser Would also be interesting to see if adding a import openpmd_api can suppress this issue for you as well.