mir-group / flare

An open-source Python package for creating fast and accurate interatomic potentials.
https://mir-group.github.io/flare
MIT License
295 stars 71 forks source link

DFT run complete in 0s #421

Closed rsdmse closed 1 month ago

rsdmse commented 1 month ago

A user is testing 1.4.2 (commit cb57c6b on Sep 30, 2024) using the same compilers, vasp, input files, and Slurm script as 1.3.3 (official release). The calculation runs fine in 1.3.3 but not for the newer version. Specifically, the first DFT call completes immediately without generating any output files:

Calling DFT...

DFT run complete.
Number of DFT calls: 1
Wall time from start: 0.00 s

which causes the subsequent command to fail:

/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/29Aug2024u1_1.4.2/lib/python3.8/site-packages/ase/md/md.py:48: FutureWarning: Specify the temperature in K using the 'temperature_K' argument
  warnings.warn(FutureWarning(w))
Traceback (most recent call last):
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/29Aug2024u1_1.4.2/bin/flare-otf", line 8, in <module>
    sys.exit(main())
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/29Aug2024u1_1.4.2/lib/python3.8/site-packages/flare/scripts/otf_train.py", line 378, in main
    fresh_start_otf(config)
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/29Aug2024u1_1.4.2/lib/python3.8/site-packages/flare/scripts/otf_train.py", line 345, in fresh_start_otf
    otf.run()
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/29Aug2024u1_1.4.2/lib/python3.8/site-packages/flare/learners/otf.py", line 332, in run
    self.initialize_train()
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/29Aug2024u1_1.4.2/lib/python3.8/site-packages/flare/learners/otf.py", line 472, in initialize_train
    self.run_dft()
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/29Aug2024u1_1.4.2/lib/python3.8/site-packages/flare/learners/otf.py", line 633, in run_dft
    copyfile(ofile, dest + "/" + dt_string + filename)
  File "/apps/software/standard/mpi/gcc/11.4.0/openmpi/4.1.4-nofabric/lammps_flare/29Aug2024u1_1.4.2/lib/python3.8/shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: 'OUTCAR'

Since we have not modified the VASP installation we don't think the DFT run failure is due to VASP. Do you know what could be causing this behavior (e.g. changes to the input file syntax)?

jonpvandermause commented 1 month ago

Thanks for reporting. Could you please post the input script? I will try to reproduce.

mohan-s1 commented 1 month ago

I apologize if this is a stupid way to upload our scripts; GitHub appears not to support .yaml, .slurm nor .py file types. All of the following files are placed in the same directory and submitted using the slurm script attached below.

File 1: fresh_online_10_03_24_agocl.yaml

supercell:
    file: POSCAR_ag_2o_2cl
    format: vasp
    index: 0
    replicate: [1, 1, 1]
    jitter: 0.0

flare_calc:
    gp: SGP_Wrapper
    kernels:
        - name: NormalizedDotProduct                       # select kernel for comparison of atomic environments
          sigma: 2.0                                       # signal variance, this hyperparameter will be trained, and is typically between 1 and 10.
          power: 2                                         # power of the kernel, influences body-order
    descriptors:
        - name: B2                                         # Atomic Cluster Expansion (ACE) descriptor from R. Drautz (2019). FLARE can only go from B1 up to B3 currently.
          nmax: 12                                          # Radial fidelity of the descriptor (higher value = higher cost)
          lmax: 4                                          # Angular fidelity of the descriptor (higher value = higher cost)
          cutoff_function: quadratic                       # Cutoff behavior
          radial_basis: chebyshev                          # Formalism for the radial basis functions
          cutoff_matrix: [[7.0, 4.0, 4.0],[4.0, 4.0, 4.0],[4.0, 4.0, 4.0]]                           # In angstroms. NxN array for N_species in a system.
    energy_noise: 0.096                                    # Energy noise hyperparameter, will be trained later. Typically set to 1 meV * N_atoms.
    forces_noise: 0.1                                     # Force noise hyperparameter, will be trained later. System dependent, typically between 0.05 meV/A and 0.2 meV/A.
    stress_noise: 0.001                                    # Stress noise hyperparameter, will be trained later. Typically set to 0.001 meV/A^3.
    energy_training: True
    force_training: True
    stress_training: True
    species:
        - 47 
        - 8    
        - 17                                            # Atomic number of your species (here, 13 = Al).
    single_atom_energies:
        - 0 
        - 0   
        - 0                  # Single atom Es to bias the E prediction of the model. Can help in systems with poor initial E estimations. Length must equal the number of species.
    cutoff: 7.0                  # Cutoff for the (ACE) descriptor. Typically informed by the RDF of the system. Should equal the maximum value in the cutoff_matrix.
    variance_type: local                                   # Calculate atomic uncertainties.
    max_iterations: 20                                     # Maximum steps taken during each hyperparameter optimization call.
    use_mapping: True                                      # Print mapped model (ready for use in LAMMPS) during trajectory. Model is re-mapped and replaced if new DFT calls are made throughout the trajectory.

dft_calc:
    name: Vasp
    kwargs: 
        command: "srun vasp_gam"
        xc: PBE
        kpts: [1, 1, 1]
        istart: 0
        ediff: 1.0e-5
        encut: 400
        ismear: 0
        sigma: 0.2
        ispin: 2
        lreal: Auto
        prec: Accurate
        algo: Very_Fast
        ncore: 10
        nelm: 500
        nelmdl: -9
        nelmin: 6
        lcharg: False
        lwave: False
        lscalapack: False
    params: {}

otf:
    mode: fresh
    md_engine: PyLAMMPS
    md_kwargs: 
        command: "srun lmp"
        specorder: [Ag, O, Cl]
        dump_period: 5
        pair_style: flare
        fix: 
            - "1 all nvt temp 523 523 0.1"
        keep_alive: False
    initial_velocity: 500                                   # Initialize the velocities (units of Kelvin)
    dt: 0.001                                                # Set the time step in picoseconds ( 0.001 = 1 fs here)
    number_of_steps: 5000                                      # Total number of MD steps to be taken
    output_name: 0_ase
    init_atoms: [0] # Initial atoms to be added to the sparse set
    std_tolerance_factor: -0.05                             # The uncertainty threshold above which the DFT will be called
    max_atoms_added: -1                                      # Allow for all atoms in a given frame to be added to the sparse set if uncertainties permit
    train_hyps: [0,0]                                      # Define range in which hyperparameters will be optimized. Here, hyps are optimized at every DFT call after the 5th call.
    write_model: 3   
    store_dft_output: [[OUTCAR,OSZICAR], ./]                                      
    update_style: threshold                                  # Sparse set update style. Atoms above a defined "threshold" will be added using this method
    update_threshold: 0.01   #Threshold (Thr) if "update_style = threshold". Thr represents relative uncer-ty to mean atomic uncer-ty, where atoms above are added to sparse set
    force_only: False 
    min_steps_with_model: 1

File 2: POSCAR_ag_2o_2cl

Ag  O Cl 
 1.0000000000000000
    25.1000000000000014    0.0000000000000000    0.0000000000000000
     0.0000000000000000   25.1000000000000014    0.0000000000000000
     0.0000000000000000    0.0000000000000000   25.1000000000000014
  13   2   2
Cartesian
 12.5500000000000007 12.5500000000000007 12.5500000000000007
 14.5999999999999428 12.5500000000000007 10.5000000000000586
 14.5999999999999428 10.5000000000000586 12.5500000000000007
 14.5999999999999428 14.5999999999999428 12.5500000000000007
 14.5999999999999428 12.5500000000000007 14.5999999999999428
 16.6500000000000625 12.5500000000000007 12.5500000000000007
 10.5000000000000586 12.5500000000000007 10.5000000000000586
 12.5500000000000007 14.5999999999999428 10.5000000000000586
 12.5500000000000007 10.5000000000000586 10.5000000000000586
 12.5500000000000007 12.5500000000000007  8.4499999999999353
 10.5000000000000586 10.5000000000000586 12.5500000000000007
 12.5500000000000007 10.5000000000000586 14.5999999999999428
 12.5500000000000007  8.4499999999999353 12.5500000000000007
 16.8667008375160989 12.3696592239405732 14.7305931547166988
 10.9347671194563478 14.1739099868444587 12.2650128878102169
 14.9891129587497858 10.1595157419203890 10.2760135607241931
 13.8123448220109104 11.1801335934864152 16.4371512753725746

File 3: save_final_file_w_velocities_10_03_24.py

import numpy as np
from numpy.random import random
import matplotlib.pyplot as plt
import matplotlib
from ase.build import molecule

from ase.io import write, read
from ase.io.trajectory import TrajectoryWriter
from ase.io.lammpsdata import write_lammps_data

a = read('0_ase_md.xyz','-1')
a.write('final_otf_structure.traj')
write_lammps_data('./initial.data',a,specorder=['Ag','O','Cl'],velocities=1)

print('save file done')

File 4: fresh_gamma.slurm to submit the job to our cluster

#!/bin/bash
#SBATCH --account paolucci
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=96
#SBATCH --time=3-00:00:00
#SBATCH --partition=parallel
#SBATCH -x udc-ba04-30c1
##SBATCH --mail-type=END
#SBATCH --no-requeue

export MODULEPATH=/project/paolucci/apps/modulefiles:$MODULEPATH
module purge
module load vasp/6.3.0
module load gcc openmpi/4.1.4-nofabric lammps_flare/29Aug2024u1_1.4.2

unset VASP_SCRIPT
export VASP_COMMAND=/home/mjs7eek/setup_vasp6_gamma.sh

which python ase lmp flare-otf

set -x
flare-otf fresh_online_10_03_24_agocl.yaml && \
python save_final_file_w_velocities_10_03_24.py && \
srun lmp -in go_nvt.in && \
echo "finish!"

File 5: go_nvt.in

units          metal
atom_style     atomic
newton         on
boundary       p p p

read_data      ./initial.data

#read_restart new.rest

mass 1 107.8682
mass 2 15.999
mass 3 35.453

pair_style flare
pair_coeff * * ./lmp.flare

########################### Geo Opt ######################################
compute unc all flare/std/atom L_inv_lmp.flare sparse_desc_lmp.flare
compute MaxUnc all reduce max c_unc

#reset_timestep 0
thermo         500
thermo_style custom step time temp ke pe etotal press c_MaxUnc

############################# MD #########################################

timestep       0.001 #ps
fix nvt all nvt temp 523 523 0.1

dump           all all custom 1000 dump.nvt id type x y z fx fy fz vx vy vz c_unc
dump           pos all custom 50 dump.pos id type x y z
dump           xyz all xyz 500 dump-nvt.xyz
dump_modify    xyz element Ag O Cl
run            100000000 upto every 100000 "if '$(c_MaxUnc) > 0.1' then quit"

write_data      0.data
write_restart   0.rest
########################################################################

#fix nvt all nvt temp 523 523 0.05

#run            20000000

#write_data      1.data
#write_restart   1.rest

########################################################################
########################################################################

#fix nvt all nvt temp 523 523 0.05

#run            80000000

#write_data      2.data
#write_restart   2.rest

########################################################################
jonpvandermause commented 1 month ago

Thank you for sending your files! GitHub makes it difficult to attach files, so I think copy/pasting is the only way to do it.

I believe I've root caused the issue. The run_dft method of the OTF class attempts to compute energies, forces, and stresses as follows:

        # Calculate DFT energy, forces, and stress.
        # Note that ASE and QE stresses differ by a minus sign.
        if "forces" in self.dft_calc.implemented_properties:
            if "forces" in self.atoms.calc.results:
                forces = self.atoms.get_forces()
            else:
                forces = None
        else:
            forces = None

        if "stress" in self.dft_calc.implemented_properties:
            if "stress" in self.atoms.calc.results:
                stress = self.atoms.get_stress()
            else:
                stress = None
        else:
            stress = None

        if "energy" in self.dft_calc.implemented_properties:
            if "energy" in self.atoms.calc.results:
                energy = self.atoms.get_potential_energy()
            else:
                energy = None
        else:
            energy = None

But they're not actually get computed here, because self.atoms.calc.results is an empty dict. This is why the OUTCAR in your example isn't getting generated.

I have a fix for this in #423. If the build passes, I will merge into master, and you should be good to go.

jonpvandermause commented 1 month ago

OK, the fix is now merged into master. Can you please give it a try and let me know how it goes?

rsdmse commented 1 month ago

Thank you for the quick fix! @mohan-s1 has tested that things are working great. We had a Slurm issue and the job got canceled after ~100 vasp + lmp runs. We'll confirm again if the job runs to completion.

rsdmse commented 1 month ago

Completed successfully. Thanks again!

jonpvandermause commented 1 month ago

Awesome, glad to hear it's working! Thanks for reporting the bug.