mir-group / flare

An open-source Python package for creating fast and accurate interatomic potentials.
https://mir-group.github.io/flare
MIT License
292 stars 71 forks source link

FLARE's capability to model small molecules, reactive force field, and OTF's convergence criteria. #267

Closed aaronchen0316 closed 3 years ago

aaronchen0316 commented 3 years ago

Hi,

I am wondering what are your thoughts on using FLARE to train a system with a cluster of small molecules (i.e. H2O, CO2 or organic solvent etc.) and whether GP is able to capture interaction in diffusion and reaction process? You showed a very helpful Methanol example under your GP from AIMD tutorial and I have been following that. But how accurate/efficient it will be for a much larger system? Also, can FLARE (or GP) to learn the diffusion or even bond breaking/forming reaction (essentially functions as a reactive force field)?

Your papers presented many good examples and tests in crystal system, but I just haven't seen much literature on the systems I just mentioned above. I am curious what your expert opinion will be?

---OTF convergence question--- I have done some OTF training on crystal/metal system and achieved good training result (with MAE around 50-70 meV/A under different temperature range). However, when I tried to use LAMMPS with MGP to simulate a large system with thermostat, the system often failed. What are some hallmarks (modeling time, MAE variation etc.) I should keep monitoring to determine the convergence of OTF?

I would really appreciate any suggestions. Thank you so much for your time and work on FLARE!

YuuuXie commented 3 years ago

Hi Aaron,

For the training of small molecules, probably @bduschatko can answer this.

As for the LAMMPS simulation, I don't think it's the problem of MAE, since 50-70 meV/A seems to be reasonable. I'm wondering, what error did you see using MGP in LAMMPS?

Thanks, Yu

mkrompiec commented 3 years ago

Hi Aaron, From my limited experience (I am not a FLARE developer), I wouldn't expect a model to work after a single training pass. Unless you did some magic with your training set (e.g. through very fortunate OTF learning), your simulation will definitely blow up, sooner than later - and that is perfectly fine. Collect a couple of frames with "distorted" geometries (just before things blow up), calculate energies & forces with your "ground truth" method and add them to your training set. Retrain, test in LAMMPS or ASE, add "interesting" geometries to your training set, etc. Repeat until the system behaves reasonably. I tried FLARE on a couple of small molecules and the results were very encouraging, but it was necessary to set up separate hyperparameters for each kernel (that is, for each pair and triplet in the 2-body and 3-body kernels).

Best, Michal

aaronchen0316 commented 3 years ago

Hi @YuuuXie,

There is no LAMMPS error message from using MGP pair style. The issue refers to the "wrong" behavior in the systems. For example, I re-ran the Al Melt from your Materials Cloud Archive, mapped the GP to the MGP pair style, ran a much larger Al system with increasing temperature in NPT ensmble. The thermo and other physical property (i.e. density) were okay after a sanity check. However, when I followed the same step for my metal melt (I used ASE_OTF for this one), atoms quickly blow apart in LAMMPS during NPT ensemble, even though the OTF training error was low at different temperature range. One thing I noticed was the pressure at NVE or NVT ensemble was extremely high. That's why I am wondering whether my training procedure could be wrong. Maybe the std_tolerance is too high, the number of atom is too small so that the lack of atomic environment causes GP overfitting, or I should use OTF instead of ASE_OTF for now etc.?

Again, thank you very much for your help!

aaronchen0316 commented 3 years ago

Hi @mkrompiec,

Thank you for your input! Yes I noticed absurd updates at the beginning of OTF when I tried to train some small molecule cluster or even reactive system. I figured that the GP wasn't well trained and it gave bad MD updates to the system which could crash my DFT calls. I have been trying to train the GP from AIMD trajectory for now so that the GP has enough data points to start with.

For the separate hyperparameters for small molecule training, are you suggesting manually specify the hyperparameters for each permutation of the 2-body/3-body kernels? So that GP has different cutoffs and parameters for bonded interaction and nonbonded interactions (or the interactions I am most interested with)?

Thank you! Aaron

mkrompiec commented 3 years ago

@aaronchen0316 Yes, see https://flare.readthedocs.io/en/latest/flare/kernels/mc_sephyps.html

YuuuXie commented 3 years ago

Hi @YuuuXie,

There is no LAMMPS error message from using MGP pair style. The issue refers to the "wrong" behavior in the systems. For example, I re-ran the Al Melt from your Materials Cloud Archive, mapped the GP to the MGP pair style, ran a much larger Al system with increasing temperature in NPT ensmble. The thermo and other physical property (i.e. density) were okay after a sanity check. However, when I followed the same step for my metal melt (I used ASE_OTF for this one), atoms quickly blow apart in LAMMPS during NPT ensemble, even though the OTF training error was low at different temperature range. One thing I noticed was the pressure at NVE or NVT ensemble was extremely high. That's why I am wondering whether my training procedure could be wrong. Maybe the std_tolerance is too high, the number of atom is too small so that the lack of atomic environment causes GP overfitting, or I should use OTF instead of ASE_OTF for now etc.?

Again, thank you very much for your help!

Based on my experience, the blow-up of the system is usually because the interatomic distance becomes too small, such that it goes below the lower bound of MGP.

Are you using the latest MGP pair style code in our master branch? I updated it a few weeks ago, adding some error info when the interatomic distance becomes lower than the MGP lower bound. I would expect the MD to be interrupted before blowing up in your system then.

If this is the case, as a solution, you can try increase the lower_bound_relax (the default is 0.1). Because the MGP lower bound is defined as smallest interatomic distance in training set minus lower_bound_relax. Therefore, you can try lower_bound_relax=0.5 or larger value. But you'd better also include some training data of such close interatomic distance frames into your training set, since the original training set has never seen such close configurations.

As for the ASE_OTF, I think recently @jonpvandermause is using it, probably can comment on it.

aaronchen0316 commented 3 years ago

Based on my experience, the blow-up of the system is usually because the interatomic distance becomes too small, such that it goes below the lower bound of MGP.

Are you using the latest MGP pair style code in our master branch? I updated it a few weeks ago, adding some error info when the interatomic distance becomes lower than the MGP lower bound. I would expect the MD to be interrupted before blowing up in your system then.

If this is the case, as a solution, you can try increase the lower_bound_relax (the default is 0.1). Because the MGP lower bound is defined as smallest interatomic distance in training set minus lower_bound_relax. Therefore, you can try lower_bound_relax=0.5 or larger value. But you'd better also include some training data of such close interatomic distance frames into your training set, since the original training set has never seen such close configurations.

As for the ASE_OTF, I think recently @jonpvandermause is using it, probably can comment on it.

I haven't recompiled my LAMMPS with the latest MGP pair style code and I will certainly update it. Thank you for your input on making the MGP and adding the training set!