mir-group / pair_allegro

LAMMPS pair style for Allegro deep learning interatomic potentials with parallelization support
https://www.nature.com/articles/s41467-023-36329-y
MIT License
33 stars 8 forks source link

Issue while Linking CXX executable lmp #3

Closed PythonFZ closed 2 years ago

PythonFZ commented 2 years ago

I'm encountering some issues compiling lammps with pair_allegro, I'm using spack with the following modules:

module load numlib/mkl/2021.4.0
module load devel/cudnn/10.2
module load compiler/llvm/10.0
module load `mpi/openmpi/4.1

Using the patch script and cmake ../cmake -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'`

with

>>> torch.__version__
'1.10.2+cu102'

the lammps build process fails with

[100%] Linking CXX executable lmp
liblammps.a(pair_allegro.cpp.o): In function `LAMMPS_NS::PairAllegro::coeff(int, char**)':
pair_allegro.cpp:(.text+0x11f4): undefined reference to `torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&)'
pair_allegro.cpp:(.text+0x13c2): undefined reference to `torch::jit::freeze_module(torch::jit::Module const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool, bool)'
liblammps.a(pair_allegro.cpp.o): In function `LAMMPS_NS::PairAllegro::compute(int, int)':
pair_allegro.cpp:(.text+0x3b71): undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_allegro.cpp.o): In function `torch::jit::Object::get_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const':
pair_allegro.cpp:(.text._ZNK5torch3jit6Object10get_methodERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE[_ZNK5torch3jit6Object10get_methodERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE]+0x15): undefined reference to `torch::jit::Object::find_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'
pair_allegro.cpp:(.text._ZNK5torch3jit6Object10get_methodERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE[_ZNK5torch3jit6Object10get_methodERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE]+0x17f): undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_allegro.cpp.o): In function `at::TensorAccessor<float, 2ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<float, 2ul>() const &':
pair_allegro.cpp:(.text._ZNKR2at10TensorBase8accessorIfLm2EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv[_ZNKR2at10TensorBase8accessorIfLm2EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv]+0xb7): undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_allegro.cpp.o): In function `at::TensorAccessor<long, 2ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<long, 2ul>() const &':
pair_allegro.cpp:(.text._ZNKR2at10TensorBase8accessorIlLm2EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv[_ZNKR2at10TensorBase8accessorIlLm2EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv]+0xb7): undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_allegro.cpp.o): In function `at::TensorAccessor<long, 1ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<long, 1ul>() const &':
pair_allegro.cpp:(.text._ZNKR2at10TensorBase8accessorIlLm1EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv[_ZNKR2at10TensorBase8accessorIlLm1EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv]+0xb7): undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_allegro.cpp.o): In function `torch::jit::Module::forward(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&)':
pair_allegro.cpp:(.text._ZN5torch3jit6Module7forwardESt6vectorIN3c106IValueESaIS4_EERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_St4hashISD_ESt8equal_toISD_ESaISt4pairIKSD_S4_EEE[_ZN5torch3jit6Module7forwardESt6vectorIN3c106IValueESaIS4_EERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_St4hashISD_ESt8equal_toISD_ESaISt4pairIKSD_S4_EEE]+0x7f): undefined reference to `torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) const'
liblammps.a(pair_allegro.cpp.o): In function `c10::IValue::IValue(char const*)':
pair_allegro.cpp:(.text._ZN3c106IValueC2EPKc[_ZN3c106IValueC2EPKc]+0x74): undefined reference to `c10::ivalue::ConstantString::create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
liblammps.a(pair_allegro.cpp.o): In function `c10::Device::validate()':
pair_allegro.cpp:(.text._ZN3c106Device8validateEv[_ZN3c106Device8validateEv]+0x59): undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
pair_allegro.cpp:(.text._ZN3c106Device8validateEv[_ZN3c106Device8validateEv]+0x9c): undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_allegro.cpp.o): In function `c10::TensorOptions::set_dtype(c10::optional<c10::ScalarType>) &':
pair_allegro.cpp:(.text._ZNR3c1013TensorOptions9set_dtypeENS_8optionalINS_10ScalarTypeEEE[_ZNR3c1013TensorOptions9set_dtypeENS_8optionalINS_10ScalarTypeEEE]+0x75): undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_allegro.cpp.o): In function `std::pair<c10::IValue, c10::IValue>::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, at::Tensor, true>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, at::Tensor&&)':
pair_allegro.cpp:(.text._ZNSt4pairIN3c106IValueES1_EC2INSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN2at6TensorELb1EEEOT_OT0_[_ZNSt4pairIN3c106IValueES1_EC2INSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN2at6TensorELb1EEEOT_OT0_]+0xa7): undefined reference to `c10::ivalue::ConstantString::create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
liblammps.a(pair_allegro.cpp.o): In function `c10::IValue::toStringView() const':
pair_allegro.cpp:(.text._ZNK3c106IValue12toStringViewEv[_ZNK3c106IValue12toStringViewEv]+0x6b): undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_allegro.cpp.o): In function `c10::IValue::toComplexDouble() const':
pair_allegro.cpp:(.text._ZNK3c106IValue15toComplexDoubleEv[_ZNK3c106IValue15toComplexDoubleEv]+0xd9): undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_nequip.cpp.o): In function `LAMMPS_NS::PairNEQUIP::coeff(int, char**)':
pair_nequip.cpp:(.text+0xeaf): undefined reference to `torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&)'
pair_nequip.cpp:(.text+0x108c): undefined reference to `torch::jit::freeze_module(torch::jit::Module const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool, bool)'
liblammps.a(pair_nequip.cpp.o): In function `LAMMPS_NS::PairNEQUIP::compute(int, int)':
pair_nequip.cpp:(.text+0x467d): undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
liblammps.a(pair_nequip.cpp.o): In function `at::TensorAccessor<float, 1ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<float, 1ul>() const &':
pair_nequip.cpp:(.text._ZNKR2at10TensorBase8accessorIfLm1EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv[_ZNKR2at10TensorBase8accessorIfLm1EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv]+0xb7): undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [CMakeFiles/lmp.dir/build.make:128: lmp] Error 1
make[1]: *** [CMakeFiles/Makefile2:1335: CMakeFiles/lmp.dir/all] Error 2
make: *** [Makefile:149: all] Error 2

Does pair_allegro require a different pytorch version? I found https://github.com/mir-group/pair_allegro/blob/c8b0738a81303e0b46d4a50c77fd066993c7d063/.github/workflows/tests.yml#L12 and assumed it is similar to pair_nequip?

Linux-cpp-lisp commented 2 years ago

@PythonFZ I would try downloading libtorch separately (see https://github.com/mir-group/pair_allegro#libtorch) and make sure to get the C++11 ABI version.

PythonFZ commented 2 years ago

Using libtorch and cudnn > 10 solved the issue. Thanks a lot.