Lammps failed with c10::error

Hi, thank you very much for developing NequIP. Though I can do training without problem (with GPU), I got error when running the model in LAMMPS. The error said terminate called after throwing an instance of 'c10::Error' what(): expected scalar type Float but found Byte which probably related to https://github.com/mir-group/pair_nequip/discussions/25#discussion-4180821. I have used Pytorch 1.11 and LAMMPS 29 Sep as suggested. I've tried to use libtorch 1.11 instead of pytorch but the same error occured. I installed NequIP 0.5.5 with Pytorch 1.11 I put the output below.

LAMMPS (29 Sep 2021 - Update 2)
  using 1 OpenMP thread(s) per MPI task
Reading data file ...
  orthogonal box = (0.0000000 0.0000000 0.0000000) to (30.000000 30.000000 30.000000)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  21 atoms
  read_data CPU = 0.001 seconds
NEQUIP is using device cuda
NequIP Coeff: type 1 is element H
NequIP Coeff: type 2 is element O
NequIP Coeff: type 3 is element C
Loading model from aspirin.pth
Freezing TorchScript model...
WARNING: Using 'neigh_modify every 1 delay 0 check yes' setting during minimization (src/min.cpp:188)
Neighbor list info ...
  update every 1 steps, delay 0 steps, check yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 5
  ghost atom cutoff = 5
  binsize = 2.5, bins = 12 12 12
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair nequip, perpetual
      attributes: full, newton off
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Setting up cg style minimization ...
  Unit style    : real
  Current step  : 0
terminate called after throwing an instance of 'c10::Error'
  what():  expected scalar type Float but found Byte
Exception raised from data_ptr<float> at /opt/conda/conda-bld/pytorch_1646755903507/work/build/aten/src/ATen/core/TensorMethods.cpp:18 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x14d0984b31bd in /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x68 (0x14d0984af838 in /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: float* at::TensorBase::data_ptr<float>() const + 0xde (0x14d09a3abc3e in /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: at::TensorAccessor<float, 2ul, at::DefaultPtrTraits, long> at::TensorBase::accessor<float, 2ul>() const & + 0xcb (0x8bea4b in ./lmp)
frame #4: ./lmp() [0x8b66b2]
frame #5: ./lmp() [0x477689]
frame #6: ./lmp() [0x47be8e]
frame #7: ./lmp() [0x439995]
frame #8: ./lmp() [0x43799b]
frame #9: ./lmp() [0x41a416]
frame #10: __libc_start_main + 0xf3 (0x14d063f84493 in /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6)
frame #11: ./lmp() [0x41a2ee]

[acc008:691367] *** Process received signal ***
[acc008:691367] Signal: Aborted (6)
[acc008:691367] Signal code:  (-6)
[acc008:691367] [ 0] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libpthread.so.0(+0x12c20)[0x14d0649dac20]
[acc008:691367] [ 1] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6(gsignal+0x10f)[0x14d063f9837f]
[acc008:691367] [ 2] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6(abort+0x127)[0x14d063f82db5]
[acc008:691367] [ 3] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libstdc++.so.6(+0x9009b)[0x14d06597a09b]
[acc008:691367] [ 4] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libstdc++.so.6(+0x9653c)[0x14d06598053c]
[acc008:691367] [ 5] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libstdc++.so.6(+0x96597)[0x14d065980597]
[acc008:691367] [ 6] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libstdc++.so.6(+0x967f8)[0x14d0659807f8]
[acc008:691367] [ 7] /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libc10.so(_ZN3c106detail14torchCheckFailEPKcS2_jRKSs+0x93)[0x14d0984af863]
[acc008:691367] [ 8] /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so(_ZNK2at10TensorBase8data_ptrIfEEPT_v+0xde)[0x14d09a3abc3e]
[acc008:691367] [ 9] ./lmp(_ZNKR2at10TensorBase8accessorIfLm2EEENS_14TensorAccessorIT_XT0_ENS_16DefaultPtrTraitsElEEv+0xcb)[0x8bea4b]
[acc008:691367] [10] ./lmp[0x8b66b2]
[acc008:691367] [11] ./lmp[0x477689]
[acc008:691367] [12] ./lmp[0x47be8e]
[acc008:691367] [13] ./lmp[0x439995]
[acc008:691367] [14] ./lmp[0x43799b]
[acc008:691367] [15] ./lmp[0x41a416]
[acc008:691367] [16] /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/libc.so.6(__libc_start_main+0xf3)[0x14d063f84493]
[acc008:691367] [17] ./lmp[0x41a2ee]
[acc008:691367] *** End of error message ***
Aborted (core dumped)

Curiously, when I compile LAMMPS with Pytorch 1.12 (CPU only) the MD can run successfully. I'd appreciate it if you have any suggestion to solve this problem.

Below are more details on the system that I experiment with. I'm sorry for the lengthy message.

System: I use minimal.yaml as NequIP input which can be found in NequIP source directory. Then I deploy the model using nequip-deploy to get .pth file which then I use in LAMMPS.
Computer: NVIDIA A100 with CUDA 11.6 loaded
I install pytorch through the following command: conda install pytorch==1.11.0 cudatoolkit=11.3 -c pytorch

conda list for the environment that I use:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             5.1                       1_gnu
asttokens                 2.0.5              pyhd3eb1b0_0
backcall                  0.2.0              pyhd3eb1b0_0
blas                      1.0                         mkl
ca-certificates           2022.07.19           h06a4308_0
certifi                   2022.6.15        py38h06a4308_0
cudatoolkit               11.3.1               h2bc3f7f_2
decorator                 5.1.1              pyhd3eb1b0_0
executing                 0.8.3              pyhd3eb1b0_0
intel-openmp              2022.0.1          h06a4308_3633
ipython                   8.4.0            py38h06a4308_0
jedi                      0.18.1           py38h06a4308_1
ld_impl_linux-64          2.38                 h1181459_1
libffi                    3.3                  he6710b0_2
libgcc-ng                 11.2.0               h1234567_1
libgomp                   11.2.0               h1234567_1
libstdcxx-ng              11.2.0               h1234567_1
libuv                     1.40.0               h7b6447c_0
matplotlib-inline         0.1.2              pyhd3eb1b0_2
mkl                       2022.0.1           h06a4308_117
mkl-include               2022.0.1           h06a4308_117
ncurses                   6.3                  h5eee18b_3
numpy                     1.23.2                   pypi_0    pypi
openssl                   1.1.1q               h7f8727e_0
parso                     0.8.3              pyhd3eb1b0_0
pexpect                   4.8.0              pyhd3eb1b0_3
pickleshare               0.7.5           pyhd3eb1b0_1003
pip                       22.1.2           py38h06a4308_0
prompt-toolkit            3.0.20             pyhd3eb1b0_0
ptyprocess                0.7.0              pyhd3eb1b0_2
pure_eval                 0.2.2              pyhd3eb1b0_0
pygments                  2.11.2             pyhd3eb1b0_0
python                    3.8.13               h12debd9_0
pytorch                   1.11.0          py3.8_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
readline                  8.1.2                h7f8727e_1
setuptools                63.4.1           py38h06a4308_0
six                       1.16.0             pyhd3eb1b0_1
sqlite                    3.39.2               h5082296_0
stack_data                0.2.0              pyhd3eb1b0_0
tk                        8.6.12               h1ccaba5_0
traitlets                 5.1.1              pyhd3eb1b0_0
typing_extensions         4.3.0            py38h06a4308_0
wcwidth                   0.2.5              pyhd3eb1b0_0
wheel                     0.37.1             pyhd3eb1b0_0
xz                        5.2.5                h7f8727e_1
zlib                      1.2.12               h7f8727e_2

cmake output:


-- The CXX compiler identification is NVHPC 22.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/app/hpc_sdk/Linux_x86_64/22.2/compilers/bin/nvc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /home/k0107/k010716/bin/git (found version "2.27.0")
-- Appending /home/app/openmpi/4.1.2/lib to CMAKE_LIBRARY_PATH: /home/app/openmpi/4.1.2/lib
-- Running check for auto-generated files from make-based build system
-- Found MPI_CXX: /home/app/openmpi/4.1.2/lib/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Looking for C++ include omp.h
-- Looking for C++ include omp.h - found
-- Found OpenMP_CXX: -mp
-- Found OpenMP: TRUE
-- Found JPEG: /usr/lib64/libjpeg.so (found version "62")
-- Found PNG: /usr/lib64/libpng.so (found version "1.6.34")
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.11")
-- Found GZIP: /bin/gzip
-- Could NOT find FFMPEG (missing: FFMPEG_EXECUTABLE)
-- Looking for C++ include cmath
-- Looking for C++ include cmath - found
-- Generating style headers...
-- Generating package headers...
-- Generating lmpinstalledpkgs.h...
-- Could NOT find ClangFormat (missing: ClangFormat_EXECUTABLE) (Required is at least version "8.0")
-- The following tools and libraries have been found and configured:
* Git
* MPI
* OpenMP
* JPEG
* PNG
* ZLIB

-- <<< Build configuration >>> Operating System: Linux Red Hat Enterprise Linux 8.5 Build type: RelWithDebInfo Install path: /home/k0107/k010716/.local Generator: Unix Makefiles using /bin/gmake -- Enabled packages: -- <<< Compilers and Flags: >>> -- C++ Compiler: /home/app/hpc_sdk/Linux_x86_64/22.2/compilers/bin/nvc++ Type: NVHPC Version: 22.2.0 C++ Flags: -O2 -gopt Defines: LAMMPS_SMALLBIG;LAMMPS_MEMALIGN=64;LAMMPS_OMP_COMPAT=4;LAMMPS_JPEG;LAMMPS_PNG;LAMMPS_GZIP -- <<< Linker flags: >>> -- Executable name: lmp -- Static library flags: -- <<< MPI flags >>> -- MPI_defines: MPICH_SKIP_MPICXX;OMPI_SKIP_MPICXX;_MPICC_H -- MPI includes: /home/app/openmpi/4.1.2/include -- MPI libraries: /home/app/openmpi/4.1.2/lib/libmpi.so; -- Looking for C++ include pthread.h -- Looking for C++ include pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found CUDA: /home/k0107/k010716/GPU/cuda/ (found version "11.6") -- The CUDA compiler identification is NVIDIA 11.6.55 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /home/app/hpc_sdk/Linux_x86_64/22.2/compilers/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Caffe2: CUDA detected: 11.6 -- Caffe2: CUDA nvcc is: /home/k0107/k010716/GPU/cuda/bin/nvcc -- Caffe2: CUDA toolkit directory: /home/k0107/k010716/GPU/cuda/ -- Caffe2: Header version is: 11.6 -- Found CUDNN: /home/k0107/k010716/GPU/cudnn/lib/libcudnn.so -- Found cuDNN: v8.5.0 (include: /home/k0107/k010716/GPU/cudnn/include, library: /home/k0107/k010716/GPU/cudnn/lib/libcudnn.so) -- /home/k0107/k010716/GPU/cuda/lib64/libnvrtc.so shorthash is 280a23f6 -- Autodetected CUDA architecture(s): 8.0 8.0 8.0 8.0 -- Added CUDA NVCC flags for: -gencode;arch=compute_80,code=sm_80 CMake Warning at /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): static library kineto_LIBRARY-NOTFOUND not found. Call Stack (most recent call first): /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found) CMakeLists.txt:922 (find_package)

-- Found Torch: /home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/lib/libtorch.so -- Configuring done -- Generating done -- Build files have been written to: /home/k0107/k010716/LAMMPS/lammps-nequip4/build


 - After `cmake` then I do `make` and get executable though some **warnings** are printed:

"/home/k0107/k010716/LAMMPS/lammps-nequip4/src/fmt/format.h", line 1156: warning: statement is unreachable return; ^ detected during: instantiation of "void fmt::v7_lmp::detail::specs_setter::on_fill(fmt::v7_lmp::basic_string_view) [with Char=char]" at line 2823 instantiation of "const Char fmt::v7_lmp::detail::parse_align(const Char , const Char , Handler &&) [with Char=char, Handler=fmt::v7_lmp::detail::specs_checker<fmt::v7_lmp::detail::specs_handler<fmt::v7_lmp::basic_format_parse_context<char, fmt::v7_lmp::detail::error_handler>, fmt::v7_lmp::buffer_context>> &]" at line 2883 instantiation of "const Char fmt::v7_lmp::detail::parse_format_specs(const Char , const Char , SpecHandler &&) [with Char=char, SpecHandler=fmt::v7_lmp::detail::specs_checker<fmt::v7_lmp::detail::specs_handler<fmt::v7_lmp::basic_format_parse_context<char, fmt::v7_lmp::detail::error_handler>, fmt::v7_lmp::buffer_context>> &]" at line 3099 instantiation of "const Char fmt::v7_lmp::detail::format_handler<OutputIt, Char, Context>::on_format_specs(int, const Char , const Char ) [with OutputIt=fmt::v7_lmp::detail::buffer_appender, Char=char, Context=fmt::v7_lmp::buffer_context]" at line 2975 instantiation of "const Char fmt::v7_lmp::detail::parse_replacement_field(const Char , const Char , Handler &&) [with Char=char, Handler=fmt::v7_lmp::detail::format_handler<fmt::v7_lmp::detail::buffer_appender, char, fmt::v7_lmp::buffer_context> &]" at line 2997 instantiation of "void fmt::v7_lmp::detail::parse_format_string<IS_CONSTEXPR,Char,Handler>(fmt::v7_lmp::basic_string_view, Handler &&) [with IS_CONSTEXPR=false, Char=char, Handler=fmt::v7_lmp::detail::format_handler<fmt::v7_lmp::detail::buffer_appender, char, fmt::v7_lmp::buffer_context> &]" at line 3776 instantiation of "void fmt::v7_lmp::detail::vformat_to(fmt::v7_lmp::detail::buffer &, fmt::v7_lmp::basic_string_view, fmt::v7_lmp::basic_format_args<fmt::v7_lmp::basic_format_context<fmt::v7_lmp::detail::buffer_appender<fmt::v7_lmp::type_identity_t>, fmt::v7_lmp::type_identity_t>>, fmt::v7_lmp::detail::locale_ref) [with Char=char]" at line 2752 of "/home/k0107/k010716/LAMMPS/lammps-nequip4/src/fmt/format-inl.h"

"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h", line 1669: warning: unknown attribute "fallthrough" C10_FALLTHROUGH; ^ "/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h", line 1669: warning: unknown attribute "fallthrough" C10_FALLTHROUGH; ^ "/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 296: warning: unknown attribute "fallthrough" C10_FALLTHROUGH; ^

"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 299: warning: unknown attribute "fallthrough" C10_FALLTHROUGH; ^

"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 296: warning: unknown attribute "fallthrough" C10_FALLTHROUGH; ^

"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 299: warning: unknown attribute "fallthrough" C10_FALLTHROUGH; ^ "/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 360: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &)" } ^

"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 368: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &) const" } ^ "/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 360: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &)" } ^

"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 368: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &) const" } ^ "/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/c10/core/TensorImpl.h", line 1669: warning: unknown attribute "fallthrough" C10_FALLTHROUGH; ^ "/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 296: warning: unknown attribute "fallthrough" C10_FALLTHROUGH; ^

"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/ATen/core/ivalue_inl.h", line 299: warning: unknown attribute "fallthrough" C10_FALLTHROUGH; ^ "/home/k0107/k010716/LAMMPS/lammps-nequip4/src/pair_nequip.cpp", line 390: warning: variable "jtype" was declared but never referenced int jtype = type[j]; ^

"/home/k0107/k010716/LAMMPS/lammps-nequip4/src/pair_nequip.cpp", line 382: warning: variable "itype" was declared but never referenced int itype = type[i]; ^

"/home/k0107/k010716/miniconda3/envs/lammps_nequip_rev3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/ordered_dict.h", line 360: warning: missing return statement at end of non-void function "torch::OrderedDict<Key, Value>::operator[](const Key &)" } ^



Best regards,

mir-group / pair_nequip

Lammps failed with c10::error #28