mir-group / pair_allegro

LAMMPS pair style for Allegro deep learning interatomic potentials with parallelization support
https://www.nature.com/articles/s41467-023-36329-y
MIT License
34 stars 8 forks source link

"Non-numeric pressure" error (pressure = -nan) when attempting NPT in LAMMPS #53

Open samueldyoung29ctr opened 3 weeks ago

samueldyoung29ctr commented 3 weeks ago

I'm trying to do NPT calculations in LAMMPS using pair_allegro, but the Allegro model I trained is predicting -nan for the system pressure, so the NPT run fails. If I run under NVE or NVT, including the press property in the LAMMPS thermo logging, I see output like this:

   Step          Time           Temp          Press   
         0   0              450           -nan        
         1   0.001          450.54175     -nan        
         2   0.002          444.03951     -nan        
         3   0.003          440.5265      -nan        

If I attempt a NPT calculation, like this

# Run MD in the NPT ensemble, with a Nosé-Hoover thermostat starting at 450.0 K.
units metal
fix mynose all npt & 
    temp 450.0 450.0 0.05 &
    tchain 3 &
    iso 1.0 1.0 0.3
run 1000

I immediately get this error:

Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
ERROR: Non-numeric pressure - simulation unstable (src/fix_nh.cpp:1049)

I am training on a ASE dataset with stress information stored in units of energy / length^2:

In [11]: from ase.io import read

In [12]: traj = read("./trainval.traj", index=0)

In [13]: traj.get_stress()
Out[13]:
array([ 0.00203341,  0.00251201,  0.0009121 , -0.00069527,  0.00031445,
       -0.00031556])

and I can confirm that when using nequip-evaluate for inference on this same deployed model, I do get predicted stresses in the output file.

$ nequip-evaluate --train-dir <my training dir> --test-indexes test-indices.json --model test-64bit-deployed-model.pth --output output.xyz

$ head -n 3 output.xyz
440
Lattice="16.44719886779785 0.0 0.0 0.0 16.44719886779785 0.0 0.0 0.0 16.44719886779785" Properties=species:S:1:pos:R:3:energies:R:1:forces:R:3 original_dataset_index=1 energy=-2975.787905704823 stress="0.0008169945453196768 -0.00012941664369788713 0.0009707167252817042 -0.00012941664369788713 0.0004451946509430406 -0.0010155815638519358 0.0009707167252817042 -0.0010155815638519358 -0.00377564117326529" pbc="F F F"
O       -2.82277608      -6.28169346       9.18250656      -6.48192610       0.01213487      -0.06158887      -0.59497979

This problem happens no matter whether I use default_dtype: float32 in the config (and pair_style allegro3232 in the LAMMPS script) or default_dtype: float64 in the config (and pair_style allegro in the LAMMPS script). I am using NequIP 0.6.1, mir-allegro 0.2.0, and PyTorch 1.11.0 (CUDA 11.3, cuDNN 8.2.0). I compiled LAMMPS 02Aug23 using pair_allegro commit 20538c9, which is the commit introducing support for stress. Details of my compilation of LAMMPS, an example of my training config (except the default_dtype setting), and an example NVT LAMMPS input script are here.

I am forcing deletion of any overlapping atoms prior to the NPT run, and do not see any indication when running under NVT or NVE that atoms are too close together, have very high forces, or are otherwise causing the simulation to go unstable. If I switch my LAMMPS input to use pair_style lj/cut, I am able to observe pressures in the thermo output

Is there something obvious I'm missing about how to get pair_allegro to pass the stress predictions from my Allegro models to LAMMPS?

Thanks!

Linux-cpp-lisp commented 3 weeks ago

Hi @samueldyoung29ctr ,

Hm, odd. If you start a simulation in LAMMPS from exactly a frame that shows up in your training data, or the one that shows a stress tensor in nequip-evaluate, do you get the expected result at least on that first frame?

samueldyoung29ctr commented 2 weeks ago

All of the starting geometries I launch from LAMMPS appear to show -nan pressure, even on the first step. I tried printing the virial tensor elements directly as they come from the torch model output, just after they are passed to LAMMPS.

Diagnostic printing of virial tensor elements pair_allegro.cpp: ```diff if (vflag) { torch::Tensor v_tensor = output.at("virial").toTensor().cpu(); auto v = v_tensor.accessor(); // Convert from 3x3 symmetric tensor format, which NequIP outputs, to the flattened form LAMMPS expects // First [0] index on v is batch virial[0] = v[0][0][0]; virial[1] = v[0][1][1]; virial[2] = v[0][2][2]; virial[3] = v[0][0][1]; virial[4] = v[0][0][2]; virial[5] = v[0][1][2]; + std::cout << "\tVirial Voigt vector: " << std::to_string(virial[0]) << ", " << std::to_string(virial[1]) << ", " << std::to_string(virial[2]) << ", " << std::to_string(virial[3]) << ", " << std::to_string(virial[4]) << ", " << std::to_string(virial[5]) << ".\n"; } ``` pair_allegro_kokkos.cpp: ```diff if(vflag){ torch::Tensor v_tensor = output.at("virial").toTensor().cpu(); auto v = v_tensor.accessor(); // Convert from 3x3 symmetric tensor format, which NequIP outputs, to the flattened form LAMMPS expects // First [0] index on v is batch this->virial[0] = v[0][0][0]; this->virial[1] = v[0][1][1]; this->virial[2] = v[0][2][2]; this->virial[3] = v[0][0][1]; this->virial[4] = v[0][0][2]; this->virial[5] = v[0][1][2]; + std::cout << "\tVirial Voigt vector: " << std::to_string(this->virial[0]) << ", " << std::to_string(this->virial[1]) << ", " << std::to_string(this->virial[2]) << ", " << std::to_string(this->virial[3]) << ", " << std::to_string(this->virial[4]) << ", " << std::to_string(this->virial[5]) << ".\n"; } ```

All virial components are -nan in the Kokkos-accelerated version, leading to the -nan pressure.

Outputs from running 10 steps of NVE using `pair_allegro`: ``` Per MPI rank memory allocation (min/avg/max) = 4.907 | 4.907 | 4.907 Mbytes Step Time Temp Press PotEng KinEng TotEng E_pair E_bond Econserve Fmax 0 0 450 7123.5399 -23812.81 204.6899 -23608.121 -23812.81 0 -23608.121 4.6981153 Virial Voigt vector: 29.758970, 45.364822, 7.063734, 3.145753, -8.978357, -9.596899. 1 0.0005 451.08768 7333.0305 -23813.316 205.18465 -23608.132 -23813.316 0 -23608.132 5.067274 Virial Voigt vector: 39.200824, 45.856633, 13.364390, 5.394424, -21.287504, -7.662407. 2 0.001 451.81109 7584.5196 -23813.664 205.5137 -23608.15 -23813.664 0 -23608.15 4.8237129 Virial Voigt vector: 49.247942, 45.048843, 23.481787, 7.260359, -31.039119, -4.989960. 3 0.0015 451.6869 7871.0145 -23813.617 205.45722 -23608.159 -23813.617 0 -23608.159 4.4277271 Virial Voigt vector: 56.195390, 42.077853, 37.043569, 7.770809, -36.851748, -2.044238. 4 0.002 450.46312 8115.5436 -23813.056 204.90056 -23608.155 -23813.056 0 -23608.155 4.4915488 Virial Voigt vector: 59.136834, 37.968314, 54.470379, 6.436822, -38.168621, 0.849517. 5 0.0025 448.37065 8329.2582 -23812.081 203.94876 -23608.133 -23812.081 0 -23608.133 4.3433504 Virial Voigt vector: 57.057529, 36.101378, 73.077232, 3.659967, -35.406512, 3.224243. 6 0.003 446.1616 8517.6018 -23811.049 202.94394 -23608.105 -23811.049 0 -23608.105 4.5180411 Virial Voigt vector: 48.109272, 36.321093, 88.279174, 0.041631, -27.643993, 4.904825. 7 0.0035 444.84745 8596.1772 -23810.426 202.34618 -23608.08 -23810.426 0 -23608.08 5.0060566 Virial Voigt vector: 33.680779, 37.114159, 98.805940, -2.907780, -15.261765, 3.495411. 8 0.004 445.1429 8553.8979 -23810.565 202.48057 -23608.085 -23810.565 0 -23608.085 5.2285217 Virial Voigt vector: 19.102289, 39.672924, 101.937126, -5.476006, -0.462937, 0.214057. 9 0.0045 446.96257 8446.2134 -23811.421 203.30828 -23608.113 -23811.421 0 -23608.113 5.5230971 Virial Voigt vector: 9.956778, 44.490638, 97.406353, -8.867623, 14.901005, -3.847867. 10 0.005 449.42526 8347.684 -23812.577 204.42847 -23608.148 -23812.577 0 -23608.148 5.7109114 Loop time of 4.88647 on 1 procs for 10 steps with 3520 atoms ``` and I can do NPT. Output when running `pair_allegro/kk`: ``` Per MPI rank memory allocation (min/avg/max) = 3.84 | 3.84 | 3.84 Mbytes Step Time Temp Press PotEng KinEng TotEng E_pair E_bond Econserve Fmax 0 0 450 -nan -23812.81 204.6899 -23608.121 -23812.81 0 -23608.121 4.6981153 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 1 0.0005 451.08768 -nan -23813.316 205.18465 -23608.132 -23813.316 0 -23608.132 5.067274 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 2 0.001 451.81109 -nan -23813.664 205.5137 -23608.15 -23813.664 0 -23608.15 4.8237129 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 3 0.0015 451.6869 -nan -23813.617 205.45722 -23608.159 -23813.617 0 -23608.159 4.4277271 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 4 0.002 450.46312 -nan -23813.056 204.90056 -23608.155 -23813.056 0 -23608.155 4.4915488 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 5 0.0025 448.37065 -nan -23812.081 203.94877 -23608.133 -23812.081 0 -23608.133 4.3433504 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 6 0.003 446.1616 -nan -23811.049 202.94394 -23608.105 -23811.049 0 -23608.105 4.5180411 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 7 0.0035 444.84745 -nan -23810.426 202.34618 -23608.08 -23810.426 0 -23608.08 5.0060566 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 8 0.004 445.1429 -nan -23810.565 202.48057 -23608.085 -23810.565 0 -23608.085 5.2285217 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 9 0.0045 446.96257 -nan -23811.421 203.30828 -23608.113 -23811.421 0 -23608.113 5.5230971 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. 10 0.005 449.42526 -nan -23812.577 204.42847 -23608.148 -23812.577 0 -23608.148 5.7109114 Loop time of 4.88749 on 1 procs for 10 steps with 3520 atoms ``` and the same error message when attempting NPT: ``` Setting up Verlet run ... Unit style : metal Current step : 10 Time step : 0.0005 Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan. ERROR: Non-numeric pressure - simulation unstable (src/fix_nh.cpp:1049) ```

Looks like the workaround is to use the non-Kokkos pair_allegro for now. I am building and linking against cxx11 libtorch 2.0.0-cpuonly, as this is what NERSC's build uses. My build of LAMMPS is CPU-only due to resource constraints on the cluster I use. I do see in pair_allegro.cpp that there is an additional compute_custom_tensor packed into the model input that doesn't exist in the Kokkos version---perhaps this has something to do with enabling virial model outputs?

https://github.com/mir-group/pair_allegro/blob/20538c9fd308bd0d066a716805f6f085a979c741/pair_allegro.cpp#L451-L455

Happy to do some more testing if you'd like.

Linux-cpp-lisp commented 2 weeks ago

My build of LAMMPS is CPU-only due to resource constraints on the cluster I use.

Aha. I don't think we've ever actually tested Kokkos pair_allegro on CPU, nor am I sure we'd expect it to have any benefits over the OpenMP pair_allegro "plain" on CPU. @anjohan thoughts?

Still, I guess we would have expected it to work...

compute_custom_tensor should not be relevant for virials.

samueldyoung29ctr commented 1 week ago

Also, just to clarify, we should be training Allegro models on stresses in units of energy / length^3, right? E.g., for LAMMPS metal units, we should train Allegro on stresses in units of eV/ang^3, not units of bar?

samueldyoung29ctr commented 1 day ago

Update: I got around to compiling LAMMPS with CUDA support, but am still seeing this issue when using Kokkos to utilize the GPUs (NVIDIA A100-SXM4-40GB GPUs, CUDA 12.4 drivers installed).

My CMake config looks like this:

CMake config ```bash cmake ../cmake \ -D LAMMPS_EXCEPTIONS=ON \ -D BUILD_SHARED_LIBS=ON \ -D BUILD_MPI=yes \ -D BUILD_OMP=yes \ -C ../cmake/presets/gcc.cmake \ -C ../cmake/presets/kokkos-cuda.cmake \ -D PKG_KOKKOS=yes \ -D Kokkos_ARCH_ZEN3=yes \ -D Kokkos_ARCH_PASCAL60=no \ -D Kokkos_ARCH_AMPERE80=yes \ -D Kokkos_ENABLE_CUDA=yes \ -D Kokkos_ENABLE_OPENMP=yes \ -D CUFFT_LIBRARY=$CUDA_HOME/lib64/libcufft.so \ -D CUDA_INCLUDE_DIRS=$CUDA_HOME/include \ -D CUDA_CUDART_LIBRARY=$CUDA_HOME/lib64/libcudart.so \ -D CAFFE2_USE_CUDNN=ON \ -D BUILD_TOOLS=no \ -D FFT=FFTW3 \ -D FFT_KOKKOS=CUFFT \ -D FFTW3_INCLUDE_DIR=$AOCL_ROOT/include \ -D FFTW3_LIBRARY=$AOCL_LIB/libfftw3.so \ -D FFTW3_OMP_LIBRARY=$AOCL_LIB/libfftw3_omp.so \ -D CMAKE_INSTALL_PREFIX="$LAMMPS_ROOT" \ -D PKG_MANYBODY=yes \ -D PKG_MOLECULE=yes \ -D PKG_KSPACE=yes \ -D PKG_REPLICA=yes \ -D PKG_ASPHERE=yes \ -D PKG_RIGID=yes \ -D PKG_MPIIO=yes \ -D PKG_COMPRESS=yes \ -D PKG_H5MD=no \ -D PKG_OPENMP=yes \ -D CMAKE_POSITION_INDEPENDENT_CODE=yes \ -D CMAKE_EXE_FLAGS="-dynamic" \ -D CMAKE_VERBOSE_MAKEFILE=TRUE \ ``` I am building a non-debug version of LAMMPS because the build fails when I try to enable debugging symbols, no matter whether I use GCC or NVHPC compilers.

The Allegro model I am using was trained using NequIP 0.6.1, mir-allegro 0.2.0, and PyTorch 1.11.0 (py3.10_cuda11.3_cudnn8.2.0_0). It was also trained on a NVIDIA A100-SXM4-40GB GPU.

Allegro training config ```yaml BesselBasis_trainable: true PolynomialCutoff_p: 48 append: true ase_args: format: traj avg_num_neighbors: auto batch_size: 1 chemical_symbols: - H - O dataset: ase dataset_file_name: dataset_seed: 123456 default_dtype: float64 early_stopping_lower_bounds: LR: 1.0e-05 early_stopping_patiences: validation_loss: 100 early_stopping_upper_bounds: cumulative_wall: 604800.0 edge_eng_mlp_initialization: uniform edge_eng_mlp_latent_dimensions: - 32 edge_eng_mlp_nonlinearity: null ema_decay: 0.999 ema_use_num_updates: true embed_initial_edge: true env_embed_mlp_initialization: uniform env_embed_mlp_latent_dimensions: [] env_embed_mlp_nonlinearity: null env_embed_multiplicity: 64 l_max: 2 latent_mlp_initialization: uniform latent_mlp_latent_dimensions: - 64 - 64 - 64 - 64 latent_mlp_nonlinearity: silu latent_resnet: true learning_rate: 0.001 loss_coeffs: forces: - 1 - PerSpeciesL1Loss stress: 5000 total_energy: - 20.0 - PerAtomL1Loss lr_scheduler_kwargs: cooldown: 0 eps: 1.0e-08 factor: 0.9 min_lr: 0 mode: min patience: 400 threshold: 0.0001 threshold_mode: rel verbose: false lr_scheduler_name: ReduceLROnPlateau max_epochs: 50000 metrics_components: - - forces - mae - PerSpecies: true report_per_component: false - - forces - rmse - PerSpecies: true report_per_component: false - - forces - rmse - - forces - mae - - total_energy - mae - PerAtom: true - - total_energy - mae - PerAtom: true - - total_energy - rmse - PerAtom: true - - stress - mae - - stress - rmse metrics_key: validation_loss model_builders: - allegro.model.Allegro - PerSpeciesRescale - StressForceOutput - RescaleEnergyEtc n_train: 2250 n_val: 250 num_layers: 2 optimizer_kwargs: amsgrad: false betas: !!python/tuple - 0.9 - 0.999 eps: 1.0e-08 weight_decay: 0.0 optimizer_name: Adam parity: o3_full r_max: 4.0 root: results/wateronly run_name: hpo-2f39852b9d648fa732723543b02d3ca4c3581ddc seed: 123456 shuffle: true train_val_split: random two_body_latent_mlp_initialization: uniform two_body_latent_mlp_latent_dimensions: - 32 - 64 two_body_latent_mlp_nonlinearity: silu use_ema: true verbose: debug wandb: true wandb_project: ```

After deploying the best model to TorchScript format, I attempted to use it in a LAMMPS NPT simulation. The input geometry is a water-only system.

geom-thermalized-298.15K.data ``` LAMMPS data file via write_data, version 2 Aug 2023, timestep = 10000, units = metal 5184 atoms 2 atom types 0 38.7494136435 xlo xhi 0 38.7494136435 ylo yhi 0 38.7494136435 zlo zhi Masses 1 1.008 2 15.999 Atoms # atomic 4014 1 2.016863143251884 1.935372548383521 4.290501263898246 1 0 0 3979 2 2.5980102214720313 2.127124762222952 3.5158172054292423 1 0 0 3980 1 2.426726184113488 1.330444453601437 2.933484352485481 1 0 0 4235 1 0.4427546772570891 1.8248251034116987 0.3139859162106747 1 0 1 306 2 4.479768747024378 3.0437024573024676 1.0988955165290661 0 0 1 18 1 3.7138355554631155 2.7497739123432603 0.5598095055376706 0 0 0 10 2 5.838633127133152 1.38484923725688 2.427887300431939 0 0 0 1003 1 5.019716962999001 1.8773596976102618 2.24263102951684 0 1 0 57 1 5.1020184395333015 0.2780756937364116 4.697073480526178 0 0 0 58 1 4.189152573156522 1.4148146498389342 5.047283101683429 0 0 0 11 1 6.320533570258894 2.2046199287416948 2.4275733453782147 0 0 0 ... ```

There are no atom overlaps in this geometry, and the LAMMPS input script attempts to do NPT.

input.lammps ```bash # LAMMPS script for our MD systems to validate Allegro potentials # System-wide settings units metal dimension 3 atom_style atomic boundary p p p # System geometry # initial_frame.data will be written into the working directory where this # script is located. read_data ./geom-thermalized-298.15K.data # Simulation settings mass 1 1.008 mass 2 15.999 pair_style allegro pair_coeff * * ./hpo-2f39852b9d648fa732723543b02d3ca4c3581ddc.pth H O # PART B - MOLECULAR DYNAMICS delete_atoms overlap 0.1 all all # Logging thermo 1 thermo_style custom step time temp press pe ke etotal epair ebond econserve fmax # Try to rebuild neighbor lists more often neigh_modify every 1 delay 0 check yes binsize 10.0 # Also try to specify larger cutoff for ghost atoms to avoid losing atoms. comm_modify mode single cutoff 10.0 vel yes # Try specifying initial velocities for all atoms velocity all create 298.15 3127835 dist gaussian # Run MD in the NPT ensemble, with a Nosé-Hoover thermostat starting at 298.15 K and a barostat starting at 1.01325 bar. fix mynose all npt & temp 298.15 298.15 0.011 & tchain 3 & iso 1.01325 1.01325 0.03 # Be sure to dump the MD trajectory dump mdtraj all atom 40 mdtraj.lammpstrj dump mdforces all custom 40 mdforces.lammpstrj id x y z vx vy vz fx fy fz timestep 0.0005 # Set up binary restart dumps every 1000 steps in case something goes wrong. restart 1000 step-*.restart # Normal run, with a single balance first balance 1.0 shift xyz 100 1.0 run 20000 undump mdtraj undump mdforces # Finally, write out the final geometry of the system write_data geom-equilibrated-1atm.data ```

I invoke LAMMPS with Kokkos:

srun --cpu-bind=cores --gpu-bind=none lmp -k on g 4 -sf kk -pk kokkos neigh full newton on -in input.lammps

This results in the following error:

LAMMPS stdout ``` Module cudnn/linux-x86_64-9.2.0.82_cuda11 is loading... Module libtorch/1.13.1+cu116-cxx11-abi is loading... Module lammps-tpc/2Aug23 is loading... Loading lammps-tpc/2Aug23/gcc10-allegro-gpu Loading requirement: slurm scl/gcc-toolset-10 amd/aocl/gcc/4.0 cuda/cuda-11.6 penguin/openmpi/4.1.4/gcc intel/tbb/2021.12 intel/compiler-rt/2024.1.0 intel/mkl/2024.1 intel/oneapi-2024.1.0-mkl cudnn/linux-x86_64-9.2.0.82_cuda11 libtorch/1.13.1+cu116-cxx11-abi LAMMPS (2 Aug 2023 - Update 3) KOKKOS mode with Kokkos version 3.7.2 is enabled (src/KOKKOS/kokkos.cpp:108) will use up to 4 GPU(s) per node WARNING: Turning off GPU-aware MPI since it is not detected, use '-pk kokkos gpu/aware on' to override (src/KOKKOS/kokkos.cpp:316) using 1 OpenMP thread(s) per MPI task Reading data file ... orthogonal box = (0 0 0) to (38.749414 38.749414 38.749414) 1 by 2 by 2 MPI processor grid reading atoms ... 5184 atoms reading velocities ... 5184 velocities read_data CPU = 0.030 seconds Allegro is using input precision f and output precision d Allegro: Loading model from ./hpo-2f39852b9d648fa732723543b02d3ca4c3581ddc.pth Allegro: Freezing TorchScript model... Type mapping: Allegro type | Allegro name | LAMMPS type | LAMMPS name 0 | H | 1 | H 1 | O | 2 | O ti=0 tj=0 cut=4.00 ti=0 tj=1 cut=4.00 ti=1 tj=0 cut=4.00 ti=1 tj=1 cut=4.00 System init for delete_atoms ... Neighbor list info ... update: every = 1 steps, delay = 0 steps, check = yes max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 6 ghost atom cutoff = 6 binsize = 6, bins = 7 7 7 2 neighbor lists, perpetual/occasional/extra = 1 1 0 (1) command delete_atoms, occasional, copy from (2) attributes: full, newton on pair build: copy/kk/device stencil: none bin: none (2) pair allegro/kk, perpetual attributes: full, newton on, kokkos_device pair build: full/bin/kk/device stencil: full/bin/3d bin: kk/device Deleted 0 atoms, new total = 5184 Balancing ... Neighbor list info ... update: every = 1 steps, delay = 0 steps, check = yes max neighbors/atom: 2000, page size: 100000 master list distance cutoff = 6 ghost atom cutoff = 10 binsize = 10, bins = 4 4 4 1 neighbor lists, perpetual/occasional/extra = 1 0 0 (1) pair allegro/kk, perpetual attributes: full, newton on, kokkos_device pair build: full/bin/kk/device stencil: full/bin/3d bin: kk/device rebalancing time: 0.004 seconds iteration count = 26 initial/final maximal load/proc = 1369 1329 initial/final imbalance factor = 1.0563272 1.025463 x cuts: 0 1 y cuts: 0 0.4954834 1 z cuts: 0 0.49346924 1 Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.0005 ERROR: Non-numeric pressure - simulation unstable (src/fix_nh.cpp:1049) Last command: run 20000 ```

If I instead change to NVT like this:

fix mynose all nvt & 
    temp 298.15 298.15 0.011 &
    tchain 3 &
    # iso 1.01325 1.01325 0.03

and again run with Kokkos the run starts, but with nan as the pressure:

LAMMPS stdout with NVT, using Kokkos ``` Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 3.543 | 3.546 | 3.549 Mbytes Step Time Temp Press PotEng ... 0 0 298.15 nan -29784.777 ... 1 0.0005 296.93805 nan -29783.946 ... 2 0.001 296.03497 nan -29783.33 ... 3 0.0015 295.55724 nan -29783.005 ... 4 0.002 295.38153 nan -29782.879 ... 5 0.0025 295.32922 nan -29782.824 ... ```

If I take this same job, again using GPU LAMMPS, and run a CPU-only job without using Kokkos:

srun --cpu-bind=cores --gpu-bind=none lmp -in input.lammps

then pressures are calculated (although they are quite high):

LAMMPS stdout with NVT, no Kokkos ``` Setting up Verlet run ... Unit style : metal Current step : 0 Time step : 0.0005 Per MPI rank memory allocation (min/avg/max) = 3.539 | 3.545 | 3.551 Mbytes Step Time Temp Press PotEng ... 0 0 298.15 4406.7395 -29784.805 ... 1 0.0005 299.15125 4408.8153 -29785.494 ... 2 0.001 300.23769 4425.8546 -29786.253 ... 3 0.0015 300.77332 4476.0844 -29786.639 ... 4 0.002 300.32951 4566.4221 -29786.356 ... 5 0.0025 298.96391 4682.4999 -29785.437 ... 6 0.003 297.23743 4794.6713 -29784.27 ... ```

Any advice on what to try? I have been using LAMMPS 02Aug23 since the folks at NERSC have used that version for their LAMMPS+pair_allegro installation, but is there a different LAMMPS release you recommend using?

The admins on my cluster are also going to fix CUDA 12.4, so I should be able to build against more recent CUDA and related libraries in the next few weeks.

Thanks!


Update 13 Sep 2024: The problem and nan pressures persist even when compiling with the latest LAMMPS development branch (i.e., Git commit 2995cb7 from doing git clone --depth=1 https://github.com/lammps/lammps).