mir-group / pair_allegro

LAMMPS pair style for Allegro deep learning interatomic potentials with parallelization support
https://www.nature.com/articles/s41467-023-36329-y
MIT License
36 stars 8 forks source link

Issue of running NEB with mpirun #33

Open WJiangH opened 1 year ago

WJiangH commented 1 year ago

Hello Maintainers,

I've encountered an issue after compiling pair_allegro using the provided LAMMPS version in the repository. Specifically, I'm having trouble executing the "neb" command in LAMMPS.

The command I used is: mpiexec -np 6 lmp -partition 6x1 -in in.neb.sivac

Here, in.neb.sivac is sourced from the example folder in LAMMPS.

The error I received is:

LAMMPS (29 Sep 2021)
Running on 6 partitions of processors
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

For building lmp, I used the following command:

cmake ../cmake \
-DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'`\
-DPKG_KOKKOS=ON -DKokkos_ENABLE_CUDA=ON\
-DCUDA_TOOLKIT_ROOT_DIR=/cm/shared/apps/cudnn7.6-cuda10.2/7.6.5.32 \
-DCUDNN_LIBRARY_PATH=/cm/shared/apps/cudnn7.6-cuda10.2/7.6.5.32/lib64/libcudnn.so \
-DCUDNN_INCLUDE_PATH=/cm/shared/apps/cudnn7.6-cuda10.2/7.6.5.32/include \
-DTorch_DIR=/home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/share/cmake/Torch \
-DMKL_INCLUDE_DIR="$CONDA_PREFIX/include"

for lmp -h info:

Large-scale Atomic/Molecular Massively Parallel Simulator - 29 Sep 2021 - Update 2
Git info (HEAD / patch_29Sep2021_update2-modified)

Usage example: lmp -var t 300 -echo screen -in in.alloy

List of command line options supported by this LAMMPS executable:

-echo none/screen/log/both  : echoing of input script (-e)
-help                       : print this help message (-h)
-in none/filename           : read input from file or stdin (default) (-i)
-kokkos on/off ...          : turn KOKKOS mode on or off (-k)
-log none/filename          : where to send log output (-l)
-mdi '<mdi flags>'          : pass flags to the MolSSI Driver Interface
-mpicolor color             : which exe in a multi-exe mpirun cmd (-m)
-cite                       : select citation reminder style (-c)
-nocite                     : disable citation reminder (-nc)
-package style ...          : invoke package command (-pk)
-partition size1 size2 ...  : assign partition sizes (-p)
-plog basename              : basename for partition logs (-pl)
-pscreen basename           : basename for partition screens (-ps)
-restart2data rfile dfile ... : convert restart to data file (-r2data)
-restart2dump rfile dgroup dstyle dfile ... 
                            : convert restart to dump file (-r2dump)
-reorder topology-specs     : processor reordering (-r)
-screen none/filename       : where to send screen output (-sc)
-skiprun                    : skip loops in run and minimize (-sr)
-suffix gpu/intel/opt/omp   : style suffix to apply (-sf)
-var varname value          : set index style variable (-v)

OS: Linux "CentOS Linux 7 (Core)" 3.10.0-1160.90.1.el7.x86_64 on x86_64

Compiler: GNU C++ 8.3.0 with OpenMP 4.5
C++ standard: C++14
MPI v3.1: MPICH Version:    3.3.2
MPICH Release date: Tue Nov 12 21:23:16 CST 2019
MPICH ABI:  13:8:1

and my mpiexec --version:

HYDRA build details:
    Version:                                 3.3.2
    Release Date:                            Tue Nov 12 21:23:16 CST 2019
    CC:                              gcc -std=gnu99  -m64 -m64 
    CXX:                             g++  -I/cm/shared/apps/gcc/current/include/c++/4.8.5/backward/backward_old -m64 -m64 
    F77:                             gfortran -m64 -m64 
    F90:                             gfortran -m64 -m64 
    Configure options:                       '--disable-option-checking' '--prefix=/cm/shared/apps/mpich/ge/gcc/64/3.3.2' '--enable-cxx' '--with-romio' '--enable-shared' '--with-comm=shared' '--disable-devdebug' 'CC=gcc -std=gnu99' 'CFLAGS=-m64 -O2' 'LDFLAGS=-m64' 'CXX=g++' 'CXXFLAGS=-I/cm/shared/apps/gcc/current/include/c++/4.8.5/backward/backward_old -m64 -O2' 'FC=gfortran' 'FCFLAGS=-m64 -O2' 'F77=gfortran' 'FFLAGS=-m64 -O2' '--cache-file=/dev/null' '--srcdir=.' 'LIBS=' 'CPPFLAGS= -I/root/rpmbuild/BUILD/mpich-3.3.2/src/mpl/include -I/root/rpmbuild/BUILD/mpich-3.3.2/src/mpl/include -I/root/rpmbuild/BUILD/mpich-3.3.2/src/openpa/src -I/root/rpmbuild/BUILD/mpich-3.3.2/src/openpa/src -D_REENTRANT -I/root/rpmbuild/BUILD/mpich-3.3.2/src/mpi/romio/include' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Checkpointing libraries available:       
    Demux engines available:                 poll select

Would you have any insights or suggestions to resolve this issue? The torch version is 1.11 and nequip is 0.5.5.

Thank you for your assistance.

anjohan commented 1 year ago

Hi,

Thank you for your interest in Allegro!

You may be the first person to try NEB with Allegro, and I have never done NEB in LAMMPS myself, so I am not too familiar with how the processor partitioning works. It may be that our GPU assignment (https://github.com/mir-group/pair_allegro/blob/55f19d3bbdf90ef156bd6a8ac3336b1d5aa15da3/pair_allegro.cpp#L79-L88) needs to be modified for LAMMPS partitions.

Could you try to run GDB to see exactly where it is crashing? E.g.

mpiexec -np 6 gdb -batch -ex=r -ex=where -ex=q --args lmp -partition 6x1 -in in.neb.sivac
WJiangH commented 1 year ago

Hi Anjohan,

Thank you for replying. The result shown below when I run: mpiexec -np 6 gdb -batch -ex=r -ex=where -ex=q --args lmp -partition 6x1 -in in.neb.sivac

LAMMPS (29 Sep 2021)
Running on 6 partitions of processors
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
(base) [wenjiang0716@tinkercliffs1 neb]$ mpiexec -np 6 gdb -batch -ex=r -ex=where -ex=q --args lmp -partition 6x1 -in in.neb.sivac
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
LAMMPS (29 Sep 2021)
Running on 6 partitions of processors
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

I am not quite familiar with how those process working either. Thanks again for looking at my issue.

Best, JJ

anjohan commented 1 year ago

Hi,

Is that all the output, does it not print a stack trace of any kind?

As a sanity check to verify that this issue really is Allegro-related: Does your script work if you replace the pair_style allegro / pair_coeff model.pth Si with just Lennard-Jones?

pair_style      lj/cut 2.5
pair_coeff      1 1 1.0 1.0 2.5
WJiangH commented 1 year ago

Hi,

Is that all the output, does it not print a stack trace of any kind?

As a sanity check to verify that this issue really is Allegro-related: Does your script work if you replace the pair_style allegro / pair_coeff model.pth Si with just Lennard-Jones?

pair_style      lj/cut 2.5
pair_coeff      1 1 1.0 1.0 2.5

Hi, Anjohan

Yes. That is the all output content I have at my end !

The file "in.neb.sivac" I was testing comes from LAMMPS/example/neb folder, where the pari style is:

pair_style      sw
pair_coeff * * Si.sw Si

Here, I did not use any "allegro_pair" style. The purpose for this is simply test if the complied "lmp" can be executed for "neb" function.

In my practice case, I trained an ML model using Allegro model and it can be used to run simulation like structure relaxation. However, when I test the function of "neb" in my practice case, it gave error, e.g., using the command mpiexec -np 6 gdb -batch -ex=r -ex=where -ex=q --args lmp -partition 6x1 -in neb_HEA.in, it outputs

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
LAMMPS (29 Sep 2021)
Running on 6 partitions of processors
Allegro is using device Allegro is using device Allegro is using device Allegro is using device cpucpu
cpu
cpu

Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro is using device cpu
Allegro: Loading model from ../../hea-deployed.pth
Allegro is using device cpu
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

where in the neb_HEA.in file, the pair style is written as:

pair_style      allegro
pair_coeff      * * ../../hea-deployed.pth Co Cr Fe Mn Ni

I hope I described my situation clear, but let me know if it isn't.

Best, JJ

anjohan commented 1 year ago

Hi,

Are you saying that you get this error message with an unmodified in.neb.sivac that does not reference Allegro at all? If so, this sounds like a problem with your LAMMPS installation that is unrelated to Allegro. You can ask for help on https://matsci.org/c/lammps/40 , but I would also check that your LAMMPS executable is linked to the MPI libraries corresponding to your runtime (which mpirun, ldd lmp).

WJiangH commented 1 year ago

Hi,

Are you saying that you get this error message with an unmodified in.neb.sivac that does not reference Allegro at all? If so, this sounds like a problem with your LAMMPS installation that is unrelated to Allegro. You can ask for help on https://matsci.org/c/lammps/40 , but I would also check that your LAMMPS executable is linked to the MPI libraries corresponding to your runtime (which mpirun, ldd lmp).

Hi,

I checked my LAMMPS executable and the MPI libraries. By running which mpiexec, it gives, /apps/easybuild/software/tinkercliffs-rome/MPICH/3.3.2-GCC-8.3.0/bin/mpiexec,

and ldd ~/LAMMPS/lammps/build/lmp gives

linux-vdso.so.1 =>  (0x00002aaaaaacd000)
    libmpicxx.so.12 => /apps/easybuild/software/tinkercliffs-rome/MPICH/3.3.2-GCC-8.3.0/lib/libmpicxx.so.12 (0x00002aaaaaad1000)
    libmpi.so.12 => /apps/easybuild/software/tinkercliffs-rome/MPICH/3.3.2-GCC-8.3.0/lib/libmpi.so.12 (0x00002aaaaaccf000)
    libgomp.so.1 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libgomp.so.1 (0x00002aaaaaaf8000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaaaff3000)
    libpng15.so.15 => /lib64/libpng15.so.15 (0x00002aaaab20f000)
    libz.so.1 => /apps/easybuild/software/tinkercliffs-rome/zlib/1.2.11-GCCcore-8.3.0/lib/libz.so.1 (0x00002aaaaab46000)
    libtorch.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libtorch.so (0x00002aaaab43a000)
    libtorch_cpu.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (0x00002aaaab64e000)
    libtorch_cuda.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so (0x00002aaac2dd0000)
    libc10.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libc10.so (0x00002aaafdd5c000)
    libstdc++.so.6 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6 (0x00002aaafdfe2000)
    libm.so.6 => /lib64/libm.so.6 (0x00002aaafe17c000)
    libgcc_s.so.1 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libgcc_s.so.1 (0x00002aaaaab62000)
    libc.so.6 => /lib64/libc.so.6 (0x00002aaafe47e000)
    libudev.so.1 => /lib64/libudev.so.1 (0x00002aaafe84c000)
    libxml2.so.2 => /lib64/libxml2.so.2 (0x00002aaafea62000)
    librt.so.1 => /lib64/librt.so.1 (0x00002aaafedcc000)
    libgfortran.so.5 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libgfortran.so.5 (0x00002aaafefd4000)
    libquadmath.so.0 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libquadmath.so.0 (0x00002aaaaab7d000)
    /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaff243000)
    libgomp-a34b3233.so.1 => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x00002aaaff447000)
    libcudart-80664282.so.10.2 => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libcudart-80664282.so.10.2 (0x00002aaaff671000)
    libc10_cuda.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libc10_cuda.so (0x00002aaaff8f2000)
    libnvToolsExt-3965bdd0.so.1 => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libnvToolsExt-3965bdd0.so.1 (0x00002aaaffbbb000)
    libcap.so.2 => /lib64/libcap.so.2 (0x00002aaaffdc5000)
    libdw.so.1 => /lib64/libdw.so.1 (0x00002aaafffca000)
    liblzma.so.5 => /lib64/liblzma.so.5 (0x00002aab0021b000)
    libattr.so.1 => /lib64/libattr.so.1 (0x00002aab00441000)
    libelf.so.1 => /lib64/libelf.so.1 (0x00002aab00646000)
    libbz2.so.1 => /lib64/libbz2.so.1 (0x00002aab0085e000)

In practice, I can execute the command mpiexec -np 6 lmp -in minimization.in to relax the structure and it can run smoothly. As an example output:

OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
Reading data file ...
  triclinic box = (0.0000000 0.0000000 0.0000000) to (10.616668 10.651563 10.644024) with tilt (-0.0018335981 0.018162880 -0.0074016512)
  1 by 3 by 2 MPI processor grid
  reading atoms ...
  107 atoms
  read_data CPU = 0.021 seconds
Allegro is using device Allegro is using device cpuAllegro is using device cpuAllegro is using device cpu
Allegro is using device cpu
cpu

Allegro is using device cpu

Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
4 | Ni | 5 | Ni
 | 5 | Ni
Neighbor list info ...
  update every 1 steps, delay 0 steps, check yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 6
  binsize = 3, bins = 4 4 4
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair allegro, perpetual
      attributes: full, newton on, ghost
      pair build: full/bin/ghost
      stencil: full/ghost/bin/3d
      bin: standard
Setting up cg style minimization ...
  Unit style    : metal
  Current step  : 0
WARNING: Energy due to 1 extra global DOFs will be included in minimizer energies
Per MPI rank memory allocation (min/avg/max) = 4.208 | 4.210 | 4.212 Mbytes
Step Temp PotEng Press Pxx Pyy Pzz Pxy Pxz Pyz Lx Ly Lz Volume 
       0            0   -831.66632            0            0            0            0            0            0            0    10.616668    10.651563    10.644024    1203.6699 
      20            0   -831.69102            0            0            0            0            0            0            0    10.616668    10.651563    10.644024    1203.6699 
      29            0   -831.69233            0            0            0            0            0            0            0    10.616668    10.651563    10.644024    1203.6699 
Loop time of 2.93902 on 6 procs for 29 steps with 107 atoms

The error occurs when it uses the build-in function of 'neb', the mpi won't work.

Best, JJ