GAP with SOAP for a molecule

vvassilevg commented 4 years ago

I would like to fit with GAP the PES of molecules using SOAP. As a test, I am using a glycine molecule (500 training points). So far, I get high Mean Absolute Errors on the training set (the errors are of course even higher for the test set) for both Energies and Forces (above 0.07 eV and 0.7 ev/A, respectively).

I have tested different parameters for the gap fit command. One example is:

gap_fit at_file=train.xyz \
gap={soap cutoff=5.0 \
          covariance_type=dot_product \
          zeta=2 \
          delta=0.016 \
          atom_sigma=0.3 \
          l_max=14 \
          n_max=14 \
          n_sparse=2000 \
          sparse_method=cur_points} \
force_parameter_name=forces \
e0_method=average \
default_sigma={0.001 0.2 0.0 0.0} \
do_copy_at_file=F sparse_separate_file=F \
gp_file=gap_soap.xml

I have tried different values for l_max, n_max and cutoff.

An example of a molecule in my train.xyz files is:

10 Lattice="200.0 0.0 0.0 0.0 200.0 0.0 0.0 0.0 200.0" Properties=species:S:1:pos:R:3:forces:R:3 energy=-7735.046780 pbc="T T T" N -5.400440 5.468773 2.837348 0.197830 0.001017 0.374624 H -6.365580 4.176578 3.913868 0.136175 0.068701 -0.293331 H -3.910505 6.045725 3.925944 -0.120680 0.003343 -0.049358 C -4.498210 4.417443 0.471706 -0.433094 1.062071 -0.682114 C -2.707543 2.205386 0.405318 0.102343 -0.550705 -0.104142 O -1.920056 1.231402 -1.521152 -0.009198 0.153232 0.169461 O -2.067417 1.333725 2.758112 -0.292697 0.513378 -0.084358 H -0.893916 -0.033471 2.454792 0.283477 -0.392345 -0.057309 H -6.148205 4.006446 -0.739463 0.074871 -0.442791 0.154181 H -3.631109 5.954848 -0.713001 0.060973 -0.415900 0.572343

Energy is in eV, forces in eV/A, and in this case, positions are in a.u., but I have also trained using A.

Is there anything missing (or wrong) when setting the gap_fit command and the soap descriptor?

I will be very grateful for any help you can provide.

Best regards,

gabor1 commented 4 years ago

The positions MUST be in Angstroms, otherwise the force (in eV/A) is not the precise derivative of the energy. Make sure the forces are really forces and not gradients (as given by some quantum chemistry packages).

On your soap command:

(n_max,l_max) = (14,14) is complete overkill, and also slows everything down. set it to (12,6) to get high accuracy results, (8,4) for something quicker
you need to add the "add_species=T" command, otherwise all atomic species are ignored
This is further down, once you get the accuracy that you want, but if you want a potential that can be used for high temperature MD, you will need to do something to enforce atomic repulsion at close approach. Either an explicitly fitted 2b potential (can be added to the gap string, look at published papers), or some other baseline. In this case I recommend not using the "e0_method=average" command, but actually explicitly compute the isolated atom energies and add them to the training set (they will be picked up and used for e0 for each species)

Let me know how you get on!

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 26 Jun 2020, at 13:00, vvassilevg notifications@github.com wrote:

I would like to fit with GAP the PES of molecules using SOAP. As a test, I am using a glycine molecule (500 training points). So far, I get high Mean Absolute Errors on the training set (the errors are of course even higher for the test set) for both Energies and Forces (above 0.07 eV and 0.7 ev/A, respectively).

I have tested different parameters for the gap fit command. One example is:

gap_fit at_file=train.xyz \ gap={soap cutoff=5.0 \ covariance_type=dot_product \ zeta=2 \ delta=0.016 \ atom_sigma=0.3 \ l_max=14 \ n_max=14 \ n_sparse=2000 \ sparse_method=cur_points} \ force_parameter_name=forces \ e0_method=average \ default_sigma={0.001 0.2 0.0 0.0} \ do_copy_at_file=F sparse_separate_file=F \ gp_file=gap_soap.xml

I have tried different values for l_max, n_max and cutoff.

An example of a molecule in my train.xyz files is:

10 Lattice="200.0 0.0 0.0 0.0 200.0 0.0 0.0 0.0 200.0" Properties=species:S:1:pos:R:3:forces:R:3 energy=-7735.046780 pbc="T T T" N -5.400440 5.468773 2.837348 0.197830 0.001017 0.374624 H -6.365580 4.176578 3.913868 0.136175 0.068701 -0.293331 H -3.910505 6.045725 3.925944 -0.120680 0.003343 -0.049358 C -4.498210 4.417443 0.471706 -0.433094 1.062071 -0.682114 C -2.707543 2.205386 0.405318 0.102343 -0.550705 -0.104142 O -1.920056 1.231402 -1.521152 -0.009198 0.153232 0.169461 O -2.067417 1.333725 2.758112 -0.292697 0.513378 -0.084358 H -0.893916 -0.033471 2.454792 0.283477 -0.392345 -0.057309 H -6.148205 4.006446 -0.739463 0.074871 -0.442791 0.154181 H -3.631109 5.954848 -0.713001 0.060973 -0.415900 0.572343

Energy is in eV, forces in eV/A, and in this case, positions are in a.u., but I have also trained using A.

Is there anything missing (or wrong) when setting the gap_fit command and the soap descriptor?

I will be very grateful for any help you can provide.

Best regards,

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

vvassilevg commented 4 years ago

Dear Prof. Csányi,

Thank you for your message.

I already set my gap fit input with your suggested values of (n_max, l_max) and with "add_species=T", and I also set all positions in A. However, the results I am obtaining are practically the same (only forces improve a little).

I think that my forces are not gradients. So, do you think that there is something else missing in my input?

Best regards.

gabor1 commented 4 years ago

Show me your new command line, and the scatter plot of target vs predicted energies, and target vs predicted force components, the latter coloured according to the element

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 29 Jun 2020, at 12:16, vvassilevg notifications@github.com wrote:

Dear Prof. Csányi,

Thank you for your message.

I already set my gap fit input with your suggested values of (n_max, l_max) and with "add_species=T", and I also set all positions in A. However, the results I am obtaining are practically the same (only forces improve a little).

I think that my forces are not gradients. So, do you think that there is something else missing in my input?

Best regards.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

vvassilevg commented 4 years ago

The command is:

gap_fit at_file=train.xyz \
gap={soap cutoff=5.0 \
          covariance_type=dot_product \
          zeta=2 \
          delta=0.016 \
          atom_sigma=0.3 \
          add_species=T \
          n_max=8 \
          l_max=4 \
          n_sparse=4000 \
          sparse_method=cur_points} \
force_parameter_name=forces \
e0_method=average \
default_sigma={0.001 0.2 0.0 0.0} \
do_copy_at_file=F sparse_separate_file=F \
gp_file=gap_soap.xml

The plots are at the end of the message.

Do you think that (n_max, l_max) = (12,6) could make a huge improvement, or Is it possible that I would need to add the 2b or 3b potentials?

Eplot Fplot

gabor1 commented 4 years ago

Why are you using such a small delta? it is supposed to be typical energy per atom (really: target function value for the descriptor, but here that is energy per atom), which I would expect to be about 0.1 eV, and you have a 1/10th of that. I wouldn’t expect the n_max,l_max to be your limiting factor.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 29 Jun 2020, at 17:09, vvassilevg notifications@github.com wrote:

The command is:

gap_fit at_file=train.xyz \ gap={soap cutoff=5.0 \ covariance_type=dot_product \ zeta=2 \ delta=0.016 \ atom_sigma=0.3 \ add_species=T \ n_max=8 \ l_max=4 \ n_sparse=4000 \ sparse_method=cur_points} \ force_parameter_name=forces \ e0_method=average \ default_sigma={0.001 0.2 0.0 0.0} \ do_copy_at_file=F sparse_separate_file=F \ gp_file=gap_soap.xml

The plots are at the end of the message.

Do you think that (n_max, l_max) = (12,6) could make a huge improvement, or Is it possible that I would need to add the 2b or 3b potentials?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mcaroba commented 4 years ago

I actually think the delta should not matter since there's only one GAP, except for the relative ratio of regularization to delta which might be smudging the predictions (otherwise the alphas just get scaled). I would guess based on experience that the ratio atom_sigma to rcut might be too small for only 8 radial basis functions to resolve accurately. I would increase nmax or even better, increase atom_sigma. And increase the delta as Gábor suggested.

On Mon, 29 Jun 2020, 19:55 gabor1, notifications@github.com wrote:

Why are you using such a small delta? it is supposed to be typical energy per atom (really: target function value for the descriptor, but here that is energy per atom), which I would expect to be about 0.1 eV, and you have a 1/10th of that. I wouldn’t expect the n_max,l_max to be your limiting factor.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 29 Jun 2020, at 17:09, vvassilevg notifications@github.com wrote:

The command is:

gap_fit at_file=train.xyz \ gap={soap cutoff=5.0 \ covariance_type=dot_product \ zeta=2 \ delta=0.016 \ atom_sigma=0.3 \ add_species=T \ n_max=8 \ l_max=4 \ n_sparse=4000 \ sparse_method=cur_points} \ force_parameter_name=forces \ e0_method=average \ default_sigma={0.001 0.2 0.0 0.0} \ do_copy_at_file=F sparse_separate_file=F \ gp_file=gap_soap.xml

The plots are at the end of the message.

Do you think that (n_max, l_max) = (12,6) could make a huge improvement, or Is it possible that I would need to add the 2b or 3b potentials?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/213#issuecomment-651241613, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADR3Q2ECT5QEDEIKJPWOQRDRZDBRXANCNFSM4OJIKUSQ .

gabor1 commented 4 years ago

I think the delta matters because there are both energies and forces... it's best to try to stick to the heuristic. I agree that there is a rescaling of both delta and sigma that probably leaves things invariant...

n=8,l=4 is for a crude accuracy, but not this crude...

gabor1 commented 4 years ago

And the fact that the forces are not around the x=y line is the really troublesome thing

mcaroba commented 4 years ago

Yes, you are right about the deltas. I still think the atom_sigma is too small for that cutoff. Compare 0.5/3.7 for the a-C GAP to 0.3/5 for this one. Roughly twice as small and with the same number of radial functions. I think that could make for a noisy kernel.

On Mon, 29 Jun 2020, 20:37 gabor1, notifications@github.com wrote:

And the fact that the forces are not around the x=y line is the really troublesome thing

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/213#issuecomment-651261913, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADR3Q2AS54NX7QTGUX4STZDRZDGLZANCNFSM4OJIKUSQ .

gabor1 commented 4 years ago

In general you are right Miguel, but the problems here seem to me bigger.

vvassilevg commented 4 years ago

Thank you both for your comments.

I have tested different values of delta and the best result I have got so far is with 0.25: Energy MAE -- 0.005587 eV Force MAE -- 0.132692 eV/A

Eplot Fplot

Then I kept delta=0.25 and increased atoms_sigma to 0.5, the errors are practically the same, but slightly worse: Energy MAE -- 0.006723 eV Force MAE -- 0.143231 eV/A

Eplot Fplot

Do you think I can improve the accuracy of forces with even higher values of delta and/or using (n_max,l_max) = (12,6)?

Best regards,

gabor1 commented 4 years ago

Your force errors look much much better. you should think carefully whether these are good enough for your purposes.. you are likely hitting the locality limit. no solutions are easy to make the descriptors longer range (you can use multiple soaps, but in any case you might need a lot more data etc). maybe time to compute some observable that is more directly related to what you want and see if it is good enough?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 30 Jun 2020, at 15:28, vvassilevg notifications@github.com wrote:

Thank you both for your comments.

I have tested different values of delta and the best result I have got so far is with 0.25: Energy MAE -- 0.005587 eV Force MAE -- 0.132692 eV/A

Then I kept delta=0.25 and increased atoms_sigma to 0.5, the errors are practically the same, but slightly worse: Energy MAE -- 0.006723 eV Force MAE -- 0.143231 eV/A

Do you think I can improve the accuracy of forces with even higher values of delta and/or using (n_max,l_max) = (12,6)?

Best regards,

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mcaroba commented 4 years ago

Indeed, it looks a lot better. It seems that I was mistaken about both deltas and the atom_sigma...

On Tue, 30 Jun 2020 at 17:31, gabor1 notifications@github.com wrote:

Your force errors look much much better. you should think carefully whether these are good enough for your purposes.. you are likely hitting the locality limit. no solutions are easy to make the descriptors longer range (you can use multiple soaps, but in any case you might need a lot more data etc). maybe time to compute some observable that is more directly related to what you want and see if it is good enough?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 30 Jun 2020, at 15:28, vvassilevg notifications@github.com wrote:

Thank you both for your comments.

I have tested different values of delta and the best result I have got so far is with 0.25: Energy MAE -- 0.005587 eV Force MAE -- 0.132692 eV/A

Then I kept delta=0.25 and increased atoms_sigma to 0.5, the errors are practically the same, but slightly worse: Energy MAE -- 0.006723 eV Force MAE -- 0.143231 eV/A

Do you think I can improve the accuracy of forces with even higher values of delta and/or using (n_max,l_max) = (12,6)?

Best regards,

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/213#issuecomment-651830991, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADR3Q2CCFAYNZIIHHQWZ5ELRZHZLJANCNFSM4OJIKUSQ .

-- Dr. Miguel Caro

Academy of Finland Postdoctoral Researcher Department of Electrical Engineering and Automation and Department of Applied Physics Aalto University http://www.aalto.fi, Finland Email: mcaroba@gmail.com Work: miguel.caro@aalto.fi Website: miguelcaro.org

vvassilevg commented 4 years ago

I guess now that I know how to tune the parameters I can test the models to see how they perform and also work with other systems.

Thank you for your help with this issue.

Best regards,

libAtoms / QUIP

GAP with SOAP for a molecule #213