mir-group / pair_allegro

LAMMPS pair style for Allegro deep learning interatomic potentials with parallelization support
MIT License
33 stars 8 forks source link

Issue of running NEB with mpirun #33

Open WJiangH opened 8 months ago

WJiangH commented 8 months ago

Hello Maintainers,

I've encountered an issue after compiling pair_allegro using the provided LAMMPS version in the repository. Specifically, I'm having trouble executing the "neb" command in LAMMPS.

The command I used is: mpiexec -np 6 lmp -partition 6x1 -in in.neb.sivac

Here, in.neb.sivac is sourced from the example folder in LAMMPS.

The error I received is:

LAMMPS (29 Sep 2021)
Running on 6 partitions of processors
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

For building lmp, I used the following command:

cmake ../cmake \
-DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'`\
-DCUDA_TOOLKIT_ROOT_DIR=/cm/shared/apps/cudnn7.6-cuda10.2/ \
-DCUDNN_LIBRARY_PATH=/cm/shared/apps/cudnn7.6-cuda10.2/ \
-DCUDNN_INCLUDE_PATH=/cm/shared/apps/cudnn7.6-cuda10.2/ \
-DTorch_DIR=/home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/share/cmake/Torch \

for lmp -h info:

Large-scale Atomic/Molecular Massively Parallel Simulator - 29 Sep 2021 - Update 2
Git info (HEAD / patch_29Sep2021_update2-modified)

Usage example: lmp -var t 300 -echo screen -in in.alloy

List of command line options supported by this LAMMPS executable:

-echo none/screen/log/both  : echoing of input script (-e)
-help                       : print this help message (-h)
-in none/filename           : read input from file or stdin (default) (-i)
-kokkos on/off ...          : turn KOKKOS mode on or off (-k)
-log none/filename          : where to send log output (-l)
-mdi '<mdi flags>'          : pass flags to the MolSSI Driver Interface
-mpicolor color             : which exe in a multi-exe mpirun cmd (-m)
-cite                       : select citation reminder style (-c)
-nocite                     : disable citation reminder (-nc)
-package style ...          : invoke package command (-pk)
-partition size1 size2 ...  : assign partition sizes (-p)
-plog basename              : basename for partition logs (-pl)
-pscreen basename           : basename for partition screens (-ps)
-restart2data rfile dfile ... : convert restart to data file (-r2data)
-restart2dump rfile dgroup dstyle dfile ... 
                            : convert restart to dump file (-r2dump)
-reorder topology-specs     : processor reordering (-r)
-screen none/filename       : where to send screen output (-sc)
-skiprun                    : skip loops in run and minimize (-sr)
-suffix gpu/intel/opt/omp   : style suffix to apply (-sf)
-var varname value          : set index style variable (-v)

OS: Linux "CentOS Linux 7 (Core)" 3.10.0-1160.90.1.el7.x86_64 on x86_64

Compiler: GNU C++ 8.3.0 with OpenMP 4.5
C++ standard: C++14
MPI v3.1: MPICH Version:    3.3.2
MPICH Release date: Tue Nov 12 21:23:16 CST 2019
MPICH ABI:  13:8:1

and my mpiexec --version:

HYDRA build details:
    Version:                                 3.3.2
    Release Date:                            Tue Nov 12 21:23:16 CST 2019
    CC:                              gcc -std=gnu99  -m64 -m64 
    CXX:                             g++  -I/cm/shared/apps/gcc/current/include/c++/4.8.5/backward/backward_old -m64 -m64 
    F77:                             gfortran -m64 -m64 
    F90:                             gfortran -m64 -m64 
    Configure options:                       '--disable-option-checking' '--prefix=/cm/shared/apps/mpich/ge/gcc/64/3.3.2' '--enable-cxx' '--with-romio' '--enable-shared' '--with-comm=shared' '--disable-devdebug' 'CC=gcc -std=gnu99' 'CFLAGS=-m64 -O2' 'LDFLAGS=-m64' 'CXX=g++' 'CXXFLAGS=-I/cm/shared/apps/gcc/current/include/c++/4.8.5/backward/backward_old -m64 -O2' 'FC=gfortran' 'FCFLAGS=-m64 -O2' 'F77=gfortran' 'FFLAGS=-m64 -O2' '--cache-file=/dev/null' '--srcdir=.' 'LIBS=' 'CPPFLAGS= -I/root/rpmbuild/BUILD/mpich-3.3.2/src/mpl/include -I/root/rpmbuild/BUILD/mpich-3.3.2/src/mpl/include -I/root/rpmbuild/BUILD/mpich-3.3.2/src/openpa/src -I/root/rpmbuild/BUILD/mpich-3.3.2/src/openpa/src -D_REENTRANT -I/root/rpmbuild/BUILD/mpich-3.3.2/src/mpi/romio/include' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Checkpointing libraries available:       
    Demux engines available:                 poll select

Would you have any insights or suggestions to resolve this issue? The torch version is 1.11 and nequip is 0.5.5.

Thank you for your assistance.

anjohan commented 8 months ago


Thank you for your interest in Allegro!

You may be the first person to try NEB with Allegro, and I have never done NEB in LAMMPS myself, so I am not too familiar with how the processor partitioning works. It may be that our GPU assignment (https://github.com/mir-group/pair_allegro/blob/55f19d3bbdf90ef156bd6a8ac3336b1d5aa15da3/pair_allegro.cpp#L79-L88) needs to be modified for LAMMPS partitions.

Could you try to run GDB to see exactly where it is crashing? E.g.

mpiexec -np 6 gdb -batch -ex=r -ex=where -ex=q --args lmp -partition 6x1 -in in.neb.sivac
WJiangH commented 8 months ago

Hi Anjohan,

Thank you for replying. The result shown below when I run: mpiexec -np 6 gdb -batch -ex=r -ex=where -ex=q --args lmp -partition 6x1 -in in.neb.sivac

LAMMPS (29 Sep 2021)
Running on 6 partitions of processors
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
(base) [wenjiang0716@tinkercliffs1 neb]$ mpiexec -np 6 gdb -batch -ex=r -ex=where -ex=q --args lmp -partition 6x1 -in in.neb.sivac
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
warning: File "/apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6.0.25-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
LAMMPS (29 Sep 2021)
Running on 6 partitions of processors
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

I am not quite familiar with how those process working either. Thanks again for looking at my issue.

Best, JJ

anjohan commented 8 months ago


Is that all the output, does it not print a stack trace of any kind?

As a sanity check to verify that this issue really is Allegro-related: Does your script work if you replace the pair_style allegro / pair_coeff model.pth Si with just Lennard-Jones?

pair_style      lj/cut 2.5
pair_coeff      1 1 1.0 1.0 2.5
WJiangH commented 8 months ago


Is that all the output, does it not print a stack trace of any kind?

As a sanity check to verify that this issue really is Allegro-related: Does your script work if you replace the pair_style allegro / pair_coeff model.pth Si with just Lennard-Jones?

pair_style      lj/cut 2.5
pair_coeff      1 1 1.0 1.0 2.5

Hi, Anjohan

Yes. That is the all output content I have at my end !

The file "in.neb.sivac" I was testing comes from LAMMPS/example/neb folder, where the pari style is:

pair_style      sw
pair_coeff * * Si.sw Si

Here, I did not use any "allegro_pair" style. The purpose for this is simply test if the complied "lmp" can be executed for "neb" function.

In my practice case, I trained an ML model using Allegro model and it can be used to run simulation like structure relaxation. However, when I test the function of "neb" in my practice case, it gave error, e.g., using the command mpiexec -np 6 gdb -batch -ex=r -ex=where -ex=q --args lmp -partition 6x1 -in neb_HEA.in, it outputs

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
warning: File "/home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /home/wenjiang0716/anaconda3/lib/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/wenjiang0716/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/wenjiang0716/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
Missing separate debuginfo for /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5f/4fb88af97be3ecacc71363136bb015b2a07119.debug
LAMMPS (29 Sep 2021)
Running on 6 partitions of processors
Allegro is using device Allegro is using device Allegro is using device Allegro is using device cpucpu

Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro is using device cpu
Allegro: Loading model from ../../hea-deployed.pth
Allegro is using device cpu
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

where in the neb_HEA.in file, the pair style is written as:

pair_style      allegro
pair_coeff      * * ../../hea-deployed.pth Co Cr Fe Mn Ni

I hope I described my situation clear, but let me know if it isn't.

Best, JJ

anjohan commented 8 months ago


Are you saying that you get this error message with an unmodified in.neb.sivac that does not reference Allegro at all? If so, this sounds like a problem with your LAMMPS installation that is unrelated to Allegro. You can ask for help on https://matsci.org/c/lammps/40 , but I would also check that your LAMMPS executable is linked to the MPI libraries corresponding to your runtime (which mpirun, ldd lmp).

WJiangH commented 8 months ago


Are you saying that you get this error message with an unmodified in.neb.sivac that does not reference Allegro at all? If so, this sounds like a problem with your LAMMPS installation that is unrelated to Allegro. You can ask for help on https://matsci.org/c/lammps/40 , but I would also check that your LAMMPS executable is linked to the MPI libraries corresponding to your runtime (which mpirun, ldd lmp).


I checked my LAMMPS executable and the MPI libraries. By running which mpiexec, it gives, /apps/easybuild/software/tinkercliffs-rome/MPICH/3.3.2-GCC-8.3.0/bin/mpiexec,

and ldd ~/LAMMPS/lammps/build/lmp gives

linux-vdso.so.1 =>  (0x00002aaaaaacd000)
    libmpicxx.so.12 => /apps/easybuild/software/tinkercliffs-rome/MPICH/3.3.2-GCC-8.3.0/lib/libmpicxx.so.12 (0x00002aaaaaad1000)
    libmpi.so.12 => /apps/easybuild/software/tinkercliffs-rome/MPICH/3.3.2-GCC-8.3.0/lib/libmpi.so.12 (0x00002aaaaaccf000)
    libgomp.so.1 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libgomp.so.1 (0x00002aaaaaaf8000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaaaff3000)
    libpng15.so.15 => /lib64/libpng15.so.15 (0x00002aaaab20f000)
    libz.so.1 => /apps/easybuild/software/tinkercliffs-rome/zlib/1.2.11-GCCcore-8.3.0/lib/libz.so.1 (0x00002aaaaab46000)
    libtorch.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libtorch.so (0x00002aaaab43a000)
    libtorch_cpu.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (0x00002aaaab64e000)
    libtorch_cuda.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so (0x00002aaac2dd0000)
    libc10.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libc10.so (0x00002aaafdd5c000)
    libstdc++.so.6 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libstdc++.so.6 (0x00002aaafdfe2000)
    libm.so.6 => /lib64/libm.so.6 (0x00002aaafe17c000)
    libgcc_s.so.1 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libgcc_s.so.1 (0x00002aaaaab62000)
    libc.so.6 => /lib64/libc.so.6 (0x00002aaafe47e000)
    libudev.so.1 => /lib64/libudev.so.1 (0x00002aaafe84c000)
    libxml2.so.2 => /lib64/libxml2.so.2 (0x00002aaafea62000)
    librt.so.1 => /lib64/librt.so.1 (0x00002aaafedcc000)
    libgfortran.so.5 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libgfortran.so.5 (0x00002aaafefd4000)
    libquadmath.so.0 => /apps/easybuild/software/tinkercliffs-rome/GCCcore/8.3.0/lib64/libquadmath.so.0 (0x00002aaaaab7d000)
    /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaff243000)
    libgomp-a34b3233.so.1 => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x00002aaaff447000)
    libcudart-80664282.so.10.2 => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libcudart-80664282.so.10.2 (0x00002aaaff671000)
    libc10_cuda.so => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libc10_cuda.so (0x00002aaaff8f2000)
    libnvToolsExt-3965bdd0.so.1 => /home/wenjiang0716/anaconda3/envs/allegro_env/lib/python3.9/site-packages/torch/lib/libnvToolsExt-3965bdd0.so.1 (0x00002aaaffbbb000)
    libcap.so.2 => /lib64/libcap.so.2 (0x00002aaaffdc5000)
    libdw.so.1 => /lib64/libdw.so.1 (0x00002aaafffca000)
    liblzma.so.5 => /lib64/liblzma.so.5 (0x00002aab0021b000)
    libattr.so.1 => /lib64/libattr.so.1 (0x00002aab00441000)
    libelf.so.1 => /lib64/libelf.so.1 (0x00002aab00646000)
    libbz2.so.1 => /lib64/libbz2.so.1 (0x00002aab0085e000)

In practice, I can execute the command mpiexec -np 6 lmp -in minimization.in to relax the structure and it can run smoothly. As an example output:

OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
Reading data file ...
  triclinic box = (0.0000000 0.0000000 0.0000000) to (10.616668 10.651563 10.644024) with tilt (-0.0018335981 0.018162880 -0.0074016512)
  1 by 3 by 2 MPI processor grid
  reading atoms ...
  107 atoms
  read_data CPU = 0.021 seconds
Allegro is using device Allegro is using device cpuAllegro is using device cpuAllegro is using device cpu
Allegro is using device cpu

Allegro is using device cpu

Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Loading model from ../../hea-deployed.pth
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | Co | 1 | Co
1 | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | Cr | 2 | Cr
2 | Fe | 3 | Fe
3 | Mn | 4 | Mn
4 | Ni | 5 | Ni
4 | Ni | 5 | Ni
 | 5 | Ni
Neighbor list info ...
  update every 1 steps, delay 0 steps, check yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 6
  binsize = 3, bins = 4 4 4
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair allegro, perpetual
      attributes: full, newton on, ghost
      pair build: full/bin/ghost
      stencil: full/ghost/bin/3d
      bin: standard
Setting up cg style minimization ...
  Unit style    : metal
  Current step  : 0
WARNING: Energy due to 1 extra global DOFs will be included in minimizer energies
Per MPI rank memory allocation (min/avg/max) = 4.208 | 4.210 | 4.212 Mbytes
Step Temp PotEng Press Pxx Pyy Pzz Pxy Pxz Pyz Lx Ly Lz Volume 
       0            0   -831.66632            0            0            0            0            0            0            0    10.616668    10.651563    10.644024    1203.6699 
      20            0   -831.69102            0            0            0            0            0            0            0    10.616668    10.651563    10.644024    1203.6699 
      29            0   -831.69233            0            0            0            0            0            0            0    10.616668    10.651563    10.644024    1203.6699 
Loop time of 2.93902 on 6 procs for 29 steps with 107 atoms

The error occurs when it uses the build-in function of 'neb', the mpi won't work.

Best, JJ