mir-group / pair_allegro

LAMMPS pair style for Allegro deep learning interatomic potentials with parallelization support
https://www.nature.com/articles/s41467-023-36329-y
MIT License
33 stars 8 forks source link

[QUESTION] Error while using potential in lammps. #6

Closed gshs12051 closed 1 year ago

gshs12051 commented 1 year ago

in the case of MD simulation with MPI. LAMMPS didn't proceed after this stage. mpirun -np 8 lmp -sf omp -pk omp 4 -in in.lammps

run 10
No /omp style for force computation currently active

While it works well in the case of mpirun -np 4 lmp -sf omp -pk omp 8 -in in.lammps like below. I am wondering if there is a specific limit in MPI processor grid size. and sometime MD simluation ends with error below

  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 11.64 | 11.64 | 11.64 Mbytes
Step Temp TotEng PotEng Press Volume S/CPU CPULeft 
       0         1000   -502.17886   -517.56082    4733.1234    3471.2258            0            0 
      10    1037.6838   -502.17892   -518.14053    4911.4855    3471.2258   0.62056358    96670.196 
      20    1149.0517   -502.18269   -519.85735    5438.6038    3471.2258    3.3797967    57200.356 
      30    1366.1239   -502.20265   -523.21631    6466.0332    3471.2258    3.3869016    44029.363 
      40    1706.0198   -502.27363   -528.51555    8074.8025    3471.2258    3.3691646     37465.69 
      50      2092.37   -502.46846    -534.6532    9903.4456    3471.2258      0.80146    44927.751 
      60    2388.6437   -502.88855   -539.63056    11305.746    3471.2258   0.49348786    57677.206 
      70    2591.1369   -503.60771   -543.46446    12264.171    3471.2258   0.49194526    66832.571 
      80    2867.5918   -504.70262   -548.81179    13572.666    3471.2258   0.54046821    72327.097 
      90    3162.0488   -506.21135   -554.84985    14966.367    3471.2258   0.47370461    78332.381 
     100    3463.3768   -508.07882   -561.35234     16392.59    3471.2258   0.43357856    84302.633 
     110    3783.0973   -510.20537   -568.39681    17905.867    3471.2258    0.4624285    88399.774 
     120    4040.5194   -512.46371   -574.61481    19124.277    3471.2258    0.4751138    91522.343 
     130    3916.8145   -468.80556   -529.05384    18538.766    3471.2258   0.56733556    92585.622 
     140    4160.4834   -471.28922    -535.2856    19692.081    3471.2258   0.48887372    94704.054 
     150    5138.8348   -472.80773   -551.85307    24322.739    3471.2258   0.47169693    96834.505 
     160    5735.4544   -477.15941   -565.38192    27146.614    3471.2258   0.47489075    98642.676 

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 39872 RUNNING AT n020
=   EXIT CODE: 6
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
   Intel(R) MPI Library troubleshooting guide:
      https://software.intel.com/node/561764
===================================================================================
Linux-cpp-lisp commented 1 year ago

@anjohan

sklenard commented 1 year ago

Hello,

As far as I know, pair_allegro does not support OpenMP threading via the OPENMP package. You should instead set explicitly the OMP_NUM_THREADS environment variable to use omp parallelization. Alternatively, you may use the kokkos package with omp threads, assuming lammps has been compiled with -DKokkos_ENABLE_OPENMP=yes but not sure that there is a real interest.

Kind regards,

Benoit

Linux-cpp-lisp commented 1 year ago

It's possible that this is related to issues we've seen when one of the tasks doesn't have any atoms and a resulting crash; with fewer MPI ranks they may never encounter a situation where a task has no atoms.