reducing the length of the fingerprint

libAtoms / QUIP

libAtoms/QUIP molecular dynamics framework: https://libatoms.github.io

347 stars 122 forks source link

reducing the length of the fingerprint #208

Closed bpfrd closed 4 years ago

bpfrd commented 4 years ago

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

gabor1 commented 4 years ago

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd notifications@github.com wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

bpfrd commented 4 years ago

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 notifications@github.com wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd notifications@github.com wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A .

gabor1 commented 4 years ago

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd notifications@github.com wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 notifications@github.com wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd notifications@github.com wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

bpfrd commented 4 years ago

Thank you very much.

bpfrd commented 4 years ago

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 notifications@github.com wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd notifications@github.com wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 notifications@github.com wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd notifications@github.com wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A .

gabor1 commented 4 years ago

https://libatoms.github.io/GAP/quippy-descriptor-tutorial.html

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 3 Jun 2020, at 15:55, bpfrd notifications@github.com wrote:

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 notifications@github.com wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd notifications@github.com wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 notifications@github.com wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd notifications@github.com wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

bpfrd commented 4 years ago

Hi Thanks for the tutorial. It was very helpful. I was wondering how you make the kernel? I mean if you have a train set with M structures and a total number of N (=Mnat if all structures have nat atoms) atomic environments, then we calculate an MN matrix K_ij=\sump e^{-d{p,j}^2} where i refers to structure ith, j refers to environments j and p refers to all the atoms in configuration i. Do you do the same in your code? I noticed that your code is too fast for large number of training set. Or with sparsification you only take a subset of environments, lets say Q, as reference and calculate the NQ kernel? Then you solve this eq.: (K, grad K)alpha=(E,F) Where (E, F) is a vector containing all the energies and forces in the training set. K is a MN matrix (the kernel), and grad K is an 3N by N matrix. Is that correct? I read a couple of your paper including "Gaussian Approximation Potentials: a brief tutorial introduction" but it is still not clear to me what eq. you solve and how you include regularization of alpha in the eqs. There are formulas in the paper for the covariance of energies and forces but not energy and force directly. How do we calculate the covariance of energy when we don't know the distribution of them. Best regards Behnam

On Fri, Jun 5, 2020 at 1:43 AM gabor1 notifications@github.com wrote:

https://libatoms.github.io/GAP/quippy-descriptor-tutorial.html

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 3 Jun 2020, at 15:55, bpfrd notifications@github.com wrote:

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 notifications@github.com wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd notifications@github.com wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 notifications@github.com wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd notifications@github.com wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-639119414, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONVZKDOVFP42G23FHM4TRVAE73ANCNFSM4NQR5T7A . [image: image.gif]

gabor1 commented 4 years ago

See section II of this paper. Let me know if that helps, or if things are still unclear. There are explicit equations giving the kernel between energies, forces and cross terms, as well as the regularisation equations. we always use a sparse formalism, (i) otherwise the matrices get big, and (ii) it is not actually necessary to use more basis functions, the error is quickly dominated by the lack of training data, not basis

https://arxiv.org/pdf/1901.10971.pdf

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Jun 2020, at 20:05, bpfrd notifications@github.com wrote:

Hi Thanks for the tutorial. It was very helpful. I was wondering how you make the kernel? I mean if you have a train set with M structures and a total number of N (=Mnat if all structures have nat atoms) atomic environments, then we calculate an MN matrix K_ij=\sump e^{-d{p,j}^2} where i refers to structure ith, j refers to environments j and p refers to all the atoms in configuration i. Do you do the same in your code? I noticed that your code is too fast for large number of training set. Or with sparsification you only take a subset of environments, lets say Q, as reference and calculate the NQ kernel? Then you solve this eq.: (K, grad K)alpha=(E,F) Where (E, F) is a vector containing all the energies and forces in the training set. K is a MN matrix (the kernel), and grad K is an 3N by N matrix. Is that correct? I read a couple of your paper including "Gaussian Approximation Potentials: a brief tutorial introduction" but it is still not clear to me what eq. you solve and how you include regularization of alpha in the eqs. There are formulas in the paper for the covariance of energies and forces but not energy and force directly. How do we calculate the covariance of energy when we don't know the distribution of them. Best regards Behnam

On Fri, Jun 5, 2020 at 1:43 AM gabor1 notifications@github.com wrote:

https://libatoms.github.io/GAP/quippy-descriptor-tutorial.html

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 3 Jun 2020, at 15:55, bpfrd notifications@github.com wrote:

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 notifications@github.com wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd notifications@github.com wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 notifications@github.com wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd notifications@github.com wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-639119414, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONVZKDOVFP42G23FHM4TRVAE73ANCNFSM4NQR5T7A . [image: image.gif]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

bpfrd commented 4 years ago

Thank you very much for the paper. It was was a great help and resolved my problems. How many basis set environments do you usually use? or you find the optimal number by monitoring the error=||D-U \Sigma V^T||//||D||? I was wondering if you have fortran implementation of CUR so that we can compare CUR with our method for sparsification? Best regards Behnam

On Thu, Jun 11, 2020 at 12:40 AM gabor1 notifications@github.com wrote:

See section II of this paper. Let me know if that helps, or if things are still unclear. There are explicit equations giving the kernel between energies, forces and cross terms, as well as the regularisation equations. we always use a sparse formalism, (i) otherwise the matrices get big, and (ii) it is not actually necessary to use more basis functions, the error is quickly dominated by the lack of training data, not basis

https://arxiv.org/pdf/1901.10971.pdf

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Jun 2020, at 20:05, bpfrd notifications@github.com wrote:

Hi Thanks for the tutorial. It was very helpful. I was wondering how you make the kernel? I mean if you have a train set with M structures and a total number of N (=Mnat if all structures have nat atoms) atomic environments, then we calculate an MN matrix K_ij=\sump e^{-d{p,j}^2} where i refers to structure ith, j refers to environments j and p refers to all the atoms in configuration i. Do you do the same in your code? I noticed that your code is too fast for large number of training set. Or with sparsification you only take a subset of environments, lets say Q, as reference and calculate the NQ kernel? Then you solve this eq.: (K, grad K)alpha=(E,F) Where (E, F) is a vector containing all the energies and forces in the training set. K is a MN matrix (the kernel), and grad K is an 3N by N matrix. Is that correct? I read a couple of your paper including "Gaussian Approximation Potentials: a brief tutorial introduction" but it is still not clear to me what eq. you solve and how you include regularization of alpha in the eqs. There are formulas in the paper for the covariance of energies and forces but not energy and force directly. How do we calculate the covariance of energy when we don't know the distribution of them. Best regards Behnam

On Fri, Jun 5, 2020 at 1:43 AM gabor1 notifications@github.com wrote:

https://libatoms.github.io/GAP/quippy-descriptor-tutorial.html

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 3 Jun 2020, at 15:55, bpfrd notifications@github.com wrote:

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 notifications@github.com wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd notifications@github.com wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 notifications@github.com wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd notifications@github.com wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-639119414, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONVZKDOVFP42G23FHM4TRVAE73ANCNFSM4NQR5T7A

. [image: image.gif]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-642231146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV3QAZJZOUDQUDNQGP3RV7SDBANCNFSM4NQR5T7A .

gabor1 commented 4 years ago

yes, we consider the sparse basis as a convergence parameter, but we test it not for simply the span of the D matrix, but in the overall fit accuracy. the span of the matrix is just a proxy.

yes we have a fortran implmentation, it’s buried inside the quip code.. feel free to poke around.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 12 Jun 2020, at 11:26, bpfrd notifications@github.com wrote:

Thank you very much for the paper. It was was a great help and resolved my problems. How many basis set environments do you usually use? or you find the optimal number by monitoring the error=||D-U \Sigma V^T||//||D||? I was wondering if you have fortran implementation of CUR so that we can compare CUR with our method for sparsification? Best regards Behnam

On Thu, Jun 11, 2020 at 12:40 AM gabor1 notifications@github.com wrote:

See section II of this paper. Let me know if that helps, or if things are still unclear. There are explicit equations giving the kernel between energies, forces and cross terms, as well as the regularisation equations. we always use a sparse formalism, (i) otherwise the matrices get big, and (ii) it is not actually necessary to use more basis functions, the error is quickly dominated by the lack of training data, not basis

https://arxiv.org/pdf/1901.10971.pdf

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Jun 2020, at 20:05, bpfrd notifications@github.com wrote:

Hi Thanks for the tutorial. It was very helpful. I was wondering how you make the kernel? I mean if you have a train set with M structures and a total number of N (=Mnat if all structures have nat atoms) atomic environments, then we calculate an MN matrix K_ij=\sump e^{-d{p,j}^2} where i refers to structure ith, j refers to environments j and p refers to all the atoms in configuration i. Do you do the same in your code? I noticed that your code is too fast for large number of training set. Or with sparsification you only take a subset of environments, lets say Q, as reference and calculate the NQ kernel? Then you solve this eq.: (K, grad K)alpha=(E,F) Where (E, F) is a vector containing all the energies and forces in the training set. K is a MN matrix (the kernel), and grad K is an 3N by N matrix. Is that correct? I read a couple of your paper including "Gaussian Approximation Potentials: a brief tutorial introduction" but it is still not clear to me what eq. you solve and how you include regularization of alpha in the eqs. There are formulas in the paper for the covariance of energies and forces but not energy and force directly. How do we calculate the covariance of energy when we don't know the distribution of them. Best regards Behnam

On Fri, Jun 5, 2020 at 1:43 AM gabor1 notifications@github.com wrote:

https://libatoms.github.io/GAP/quippy-descriptor-tutorial.html

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 3 Jun 2020, at 15:55, bpfrd notifications@github.com wrote:

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 notifications@github.com wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd notifications@github.com wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 notifications@github.com wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd notifications@github.com wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-639119414, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONVZKDOVFP42G23FHM4TRVAE73ANCNFSM4NQR5T7A

. [image: image.gif]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-642231146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV3QAZJZOUDQUDNQGP3RV7SDBANCNFSM4NQR5T7A .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

bpfrd commented 4 years ago

Thank you very much for your reply. I found the cur_decomposition in QUIP but I couldn't understand and follow it. So I wrote a quick non-probabilistic cur code (which doesn't work properly yet). In the meanwhile I found some probabilistic CUR code in python and when I compare the fingerprint matrices D with CUR, they correlate very well even for very small k's such as 5 or 10. But this small number of basis set is obviously too small and may cause underfitting in the fitting. I was wondering what values of k I should use approximately? D(m,n) where m=LenOfFP=240 and n=#atomicEnvironemns=1000 Best regards BEhnam

On Fri, Jun 12, 2020 at 3:04 PM gabor1 notifications@github.com wrote:

yes, we consider the sparse basis as a convergence parameter, but we test it not for simply the span of the D matrix, but in the overall fit accuracy. the span of the matrix is just a proxy.

yes we have a fortran implmentation, it’s buried inside the quip code.. feel free to poke around.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 12 Jun 2020, at 11:26, bpfrd notifications@github.com wrote:

Thank you very much for the paper. It was was a great help and resolved my problems. How many basis set environments do you usually use? or you find the optimal number by monitoring the error=||D-U \Sigma V^T||//||D||? I was wondering if you have fortran implementation of CUR so that we can compare CUR with our method for sparsification? Best regards Behnam

On Thu, Jun 11, 2020 at 12:40 AM gabor1 notifications@github.com wrote:

See section II of this paper. Let me know if that helps, or if things are still unclear. There are explicit equations giving the kernel between energies, forces and cross terms, as well as the regularisation equations. we always use a sparse formalism, (i) otherwise the matrices get big, and (ii) it is not actually necessary to use more basis functions, the error is quickly dominated by the lack of training data, not basis

https://arxiv.org/pdf/1901.10971.pdf

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Jun 2020, at 20:05, bpfrd notifications@github.com wrote:

Hi Thanks for the tutorial. It was very helpful. I was wondering how you make the kernel? I mean if you have a train set with M structures and a total number of N (=Mnat if all structures have nat atoms) atomic environments, then we calculate an MN matrix K_ij=\sump e^{-d{p,j}^2} where i refers to structure ith, j refers to environments j and p refers to all the atoms in configuration i. Do you do the same in your code? I noticed that your code is too fast for large number of training set. Or with sparsification you only take a subset of environments, lets say Q, as reference and calculate the NQ kernel? Then you solve this eq.: (K, grad K)alpha=(E,F) Where (E, F) is a vector containing all the energies and forces in the training set. K is a MN matrix (the kernel), and grad K is an 3N by N matrix. Is that correct? I read a couple of your paper including "Gaussian Approximation Potentials: a brief tutorial introduction" but it is still not clear to me what eq. you solve and how you include regularization of alpha in the eqs. There are formulas in the paper for the covariance of energies and forces but not energy and force directly. How do we calculate the covariance of energy when we don't know the distribution of them. Best regards Behnam

On Fri, Jun 5, 2020 at 1:43 AM gabor1 notifications@github.com wrote:

https://libatoms.github.io/GAP/quippy-descriptor-tutorial.html

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 3 Jun 2020, at 15:55, bpfrd notifications@github.com wrote:

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 notifications@github.com wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd notifications@github.com wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 < notifications@github.com> wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd <notifications@github.com

wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-639119414>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONVZKDOVFP42G23FHM4TRVAE73ANCNFSM4NQR5T7A

. [image: image.gif]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-642231146, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV3QAZJZOUDQUDNQGP3RV7SDBANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-643201881, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV6DQG74THVF4ZNWXW3RWIAEPANCNFSM4NQR5T7A .

gabor1 commented 4 years ago

for selecting representative environments to fit a potential, we often use 1000-10,000 environments out of 100s of thousands in the entire database.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 13 Jun 2020, at 19:22, bpfrd notifications@github.com wrote:

Thank you very much for your reply. I found the cur_decomposition in QUIP but I couldn't understand and follow it. So I wrote a quick non-probabilistic cur code (which doesn't work properly yet). In the meanwhile I found some probabilistic CUR code in python and when I compare the fingerprint matrices D with CUR, they correlate very well even for very small k's such as 5 or 10. But this small number of basis set is obviously too small and may cause underfitting in the fitting. I was wondering what values of k I should use approximately? D(m,n) where m=LenOfFP=240 and n=#atomicEnvironemns=1000 Best regards BEhnam

On Fri, Jun 12, 2020 at 3:04 PM gabor1 notifications@github.com wrote:

yes, we consider the sparse basis as a convergence parameter, but we test it not for simply the span of the D matrix, but in the overall fit accuracy. the span of the matrix is just a proxy.

yes we have a fortran implmentation, it’s buried inside the quip code.. feel free to poke around.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 12 Jun 2020, at 11:26, bpfrd notifications@github.com wrote:

Thank you very much for the paper. It was was a great help and resolved my problems. How many basis set environments do you usually use? or you find the optimal number by monitoring the error=||D-U \Sigma V^T||//||D||? I was wondering if you have fortran implementation of CUR so that we can compare CUR with our method for sparsification? Best regards Behnam

On Thu, Jun 11, 2020 at 12:40 AM gabor1 notifications@github.com wrote:

See section II of this paper. Let me know if that helps, or if things are still unclear. There are explicit equations giving the kernel between energies, forces and cross terms, as well as the regularisation equations. we always use a sparse formalism, (i) otherwise the matrices get big, and (ii) it is not actually necessary to use more basis functions, the error is quickly dominated by the lack of training data, not basis

https://arxiv.org/pdf/1901.10971.pdf

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Jun 2020, at 20:05, bpfrd notifications@github.com wrote:

Hi Thanks for the tutorial. It was very helpful. I was wondering how you make the kernel? I mean if you have a train set with M structures and a total number of N (=Mnat if all structures have nat atoms) atomic environments, then we calculate an MN matrix K_ij=\sump e^{-d{p,j}^2} where i refers to structure ith, j refers to environments j and p refers to all the atoms in configuration i. Do you do the same in your code? I noticed that your code is too fast for large number of training set. Or with sparsification you only take a subset of environments, lets say Q, as reference and calculate the NQ kernel? Then you solve this eq.: (K, grad K)alpha=(E,F) Where (E, F) is a vector containing all the energies and forces in the training set. K is a MN matrix (the kernel), and grad K is an 3N by N matrix. Is that correct? I read a couple of your paper including "Gaussian Approximation Potentials: a brief tutorial introduction" but it is still not clear to me what eq. you solve and how you include regularization of alpha in the eqs. There are formulas in the paper for the covariance of energies and forces but not energy and force directly. How do we calculate the covariance of energy when we don't know the distribution of them. Best regards Behnam

On Fri, Jun 5, 2020 at 1:43 AM gabor1 notifications@github.com wrote:

https://libatoms.github.io/GAP/quippy-descriptor-tutorial.html

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 3 Jun 2020, at 15:55, bpfrd notifications@github.com wrote:

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 notifications@github.com wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd notifications@github.com wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 < notifications@github.com> wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd <notifications@github.com

wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-639119414>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONVZKDOVFP42G23FHM4TRVAE73ANCNFSM4NQR5T7A

. [image: image.gif]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-642231146, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV3QAZJZOUDQUDNQGP3RV7SDBANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-643201881, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV6DQG74THVF4ZNWXW3RWIAEPANCNFSM4NQR5T7A .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

bpfrd commented 4 years ago

Thank you very much for your reply. I was wondering if you also scale K_NM or K_MM before solving equation 23 in the paper. I am asking this because I followed the instruction in the paper and wrote a code but when I compare my results with GAP, the best rmse for energy and force I can get is at least a few times more than the rmse's in GAP (sometimes 10 times).

(I use exp(-d_ij^2/ sigma^2) for kernel(i,j) with OM matrix fingerprint in my codes, but I use SOAP in GAP.) https://arxiv.org/pdf/1901.10971.pdf Best regards Behnam

On Wed, Jun 17, 2020 at 9:01 PM gabor1 notifications@github.com wrote:

for selecting representative environments to fit a potential, we often use 1000-10,000 environments out of 100s of thousands in the entire database.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 13 Jun 2020, at 19:22, bpfrd notifications@github.com wrote:

Thank you very much for your reply. I found the cur_decomposition in QUIP but I couldn't understand and follow it. So I wrote a quick non-probabilistic cur code (which doesn't work properly yet). In the meanwhile I found some probabilistic CUR code in python and when I compare the fingerprint matrices D with CUR, they correlate very well even for very small k's such as 5 or 10. But this small number of basis set is obviously too small and may cause underfitting in the fitting. I was wondering what values of k I should use approximately? D(m,n) where m=LenOfFP=240 and n=#atomicEnvironemns=1000 Best regards BEhnam

On Fri, Jun 12, 2020 at 3:04 PM gabor1 notifications@github.com wrote:

yes, we consider the sparse basis as a convergence parameter, but we test it not for simply the span of the D matrix, but in the overall fit accuracy. the span of the matrix is just a proxy.

yes we have a fortran implmentation, it’s buried inside the quip code.. feel free to poke around.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 12 Jun 2020, at 11:26, bpfrd notifications@github.com wrote:

Thank you very much for the paper. It was was a great help and resolved my problems. How many basis set environments do you usually use? or you find the optimal number by monitoring the error=||D-U \Sigma V^T||//||D||? I was wondering if you have fortran implementation of CUR so that we can compare CUR with our method for sparsification? Best regards Behnam

On Thu, Jun 11, 2020 at 12:40 AM gabor1 notifications@github.com wrote:

See section II of this paper. Let me know if that helps, or if things are still unclear. There are explicit equations giving the kernel between energies, forces and cross terms, as well as the regularisation equations. we always use a sparse formalism, (i) otherwise the matrices get big, and (ii) it is not actually necessary to use more basis functions, the error is quickly dominated by the lack of training data, not basis

https://arxiv.org/pdf/1901.10971.pdf

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Jun 2020, at 20:05, bpfrd notifications@github.com wrote:

Hi Thanks for the tutorial. It was very helpful. I was wondering how you make the kernel? I mean if you have a train set with M structures and a total number of N (=Mnat if all structures have nat atoms) atomic environments, then we calculate an MN matrix K_ij=\sump e^{-d{p,j}^2} where i refers to structure ith, j refers to environments j and p refers to all the atoms in configuration i. Do you do the same in your code? I noticed that your code is too fast for large number of training set. Or with sparsification you only take a subset of environments, lets say Q, as reference and calculate the NQ kernel? Then you solve this eq.: (K, grad K)alpha=(E,F) Where (E, F) is a vector containing all the energies and forces in the training set. K is a MN matrix (the kernel), and grad K is an 3N by N matrix. Is that correct? I read a couple of your paper including "Gaussian Approximation Potentials: a brief tutorial introduction" but it is still not clear to me what eq. you solve and how you include regularization of alpha in the eqs. There are formulas in the paper for the covariance of energies and forces but not energy and force directly. How do we calculate the covariance of energy when we don't know the distribution of them. Best regards Behnam

On Fri, Jun 5, 2020 at 1:43 AM gabor1 notifications@github.com wrote:

https://libatoms.github.io/GAP/quippy-descriptor-tutorial.html

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 3 Jun 2020, at 15:55, bpfrd notifications@github.com wrote:

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 < notifications@github.com> wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd <notifications@github.com

wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 < notifications@github.com> wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd < notifications@github.com

wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054 ,

or

unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-639119414>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONVZKDOVFP42G23FHM4TRVAE73ANCNFSM4NQR5T7A

. [image: image.gif]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-642231146>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3QAZJZOUDQUDNQGP3RV7SDBANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-643201881, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV6DQG74THVF4ZNWXW3RWIAEPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-645473805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONVY6IPFGRYBRFHATYU3RXDVVFANCNFSM4NQR5T7A .

gabor1 commented 4 years ago

yes, there is the delta parameter, our kernels are defined to have delta^2 as their variance on the diagonal. delta has units of energy/atom , and a good heuristic is to use the binding energy per atom as the delta. (but if you have a baseline or doing combined model like 2-body + SOAP, then each kernel component gets is own delta, the 2b might have 1 eV, soap might only have 0.1).

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 18 Jun 2020, at 18:49, bpfrd notifications@github.com wrote:

Thank you very much for your reply. I was wondering if you also scale K_NM or K_MM before solving equation 23 in the paper. I am asking this because I followed the instruction in the paper and wrote a code but when I compare my results with GAP, the best rmse for energy and force I can get is at least a few times more than the rmse's in GAP (sometimes 10 times).

(I use exp(-d_ij^2/ sigma^2) for kernel(i,j) with OM matrix fingerprint in my codes, but I use SOAP in GAP.) https://arxiv.org/pdf/1901.10971.pdf Best regards Behnam

On Wed, Jun 17, 2020 at 9:01 PM gabor1 notifications@github.com wrote:

for selecting representative environments to fit a potential, we often use 1000-10,000 environments out of 100s of thousands in the entire database.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 13 Jun 2020, at 19:22, bpfrd notifications@github.com wrote:

Thank you very much for your reply. I found the cur_decomposition in QUIP but I couldn't understand and follow it. So I wrote a quick non-probabilistic cur code (which doesn't work properly yet). In the meanwhile I found some probabilistic CUR code in python and when I compare the fingerprint matrices D with CUR, they correlate very well even for very small k's such as 5 or 10. But this small number of basis set is obviously too small and may cause underfitting in the fitting. I was wondering what values of k I should use approximately? D(m,n) where m=LenOfFP=240 and n=#atomicEnvironemns=1000 Best regards BEhnam

On Fri, Jun 12, 2020 at 3:04 PM gabor1 notifications@github.com wrote:

yes, we consider the sparse basis as a convergence parameter, but we test it not for simply the span of the D matrix, but in the overall fit accuracy. the span of the matrix is just a proxy.

yes we have a fortran implmentation, it’s buried inside the quip code.. feel free to poke around.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 12 Jun 2020, at 11:26, bpfrd notifications@github.com wrote:

Thank you very much for the paper. It was was a great help and resolved my problems. How many basis set environments do you usually use? or you find the optimal number by monitoring the error=||D-U \Sigma V^T||//||D||? I was wondering if you have fortran implementation of CUR so that we can compare CUR with our method for sparsification? Best regards Behnam

On Thu, Jun 11, 2020 at 12:40 AM gabor1 notifications@github.com wrote:

See section II of this paper. Let me know if that helps, or if things are still unclear. There are explicit equations giving the kernel between energies, forces and cross terms, as well as the regularisation equations. we always use a sparse formalism, (i) otherwise the matrices get big, and (ii) it is not actually necessary to use more basis functions, the error is quickly dominated by the lack of training data, not basis

https://arxiv.org/pdf/1901.10971.pdf

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Jun 2020, at 20:05, bpfrd notifications@github.com wrote:

Hi Thanks for the tutorial. It was very helpful. I was wondering how you make the kernel? I mean if you have a train set with M structures and a total number of N (=Mnat if all structures have nat atoms) atomic environments, then we calculate an MN matrix K_ij=\sump e^{-d{p,j}^2} where i refers to structure ith, j refers to environments j and p refers to all the atoms in configuration i. Do you do the same in your code? I noticed that your code is too fast for large number of training set. Or with sparsification you only take a subset of environments, lets say Q, as reference and calculate the NQ kernel? Then you solve this eq.: (K, grad K)alpha=(E,F) Where (E, F) is a vector containing all the energies and forces in the training set. K is a MN matrix (the kernel), and grad K is an 3N by N matrix. Is that correct? I read a couple of your paper including "Gaussian Approximation Potentials: a brief tutorial introduction" but it is still not clear to me what eq. you solve and how you include regularization of alpha in the eqs. There are formulas in the paper for the covariance of energies and forces but not energy and force directly. How do we calculate the covariance of energy when we don't know the distribution of them. Best regards Behnam

On Fri, Jun 5, 2020 at 1:43 AM gabor1 notifications@github.com wrote:

https://libatoms.github.io/GAP/quippy-descriptor-tutorial.html

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 3 Jun 2020, at 15:55, bpfrd notifications@github.com wrote:

How can I get the soap vectors into python that you mentioned? Where can I find a tutorial for that?

On Tue, Jun 2, 2020 at 3:06 PM gabor1 < notifications@github.com> wrote:

the sparsification of SOAP elements is not yet implemented (you can do it by hand if you get the soap vectors into python). CUR-based sparsification of the input environments for the purposes of sparse kernel regression is implemented ("sparse_method=cur_points"). what isn’t yet implemented is the complete exclusion of input data (so not just the environments as basis locations, but even the energies and forces) based on some similarity or other heuristic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 11:30, bpfrd <notifications@github.com

wrote:

For example CUR decomposition, farthest point sampling, ... introduced in the below paper: "Automatic selection of atomic fingerprintsand reference configurations for machine-learning potentials" Best regards Behnam

On Tue, Jun 2, 2020 at 2:46 PM gabor1 < notifications@github.com> wrote:

which particular "scheme" are you thinking of ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Jun 2020, at 10:41, bpfrd < notifications@github.com

wrote:

Hello I was wondering if the scheme that reduces the length of the fingerprint for efficiency (with retaining the quality of the fingerprint) is also implemented in GAP and if we can use it? Best regards Behnam

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/libAtoms/QUIP/issues/208#issuecomment-637441054 ,

or

unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3NTZWROUCFQ4RR64TRUTGQ3ANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-637450053 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV7ZWBI7VEUOODTXZC3RUTIZPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-639119414>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONVZKDOVFP42G23FHM4TRVAE73ANCNFSM4NQR5T7A

. [image: image.gif]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/libAtoms/QUIP/issues/208#issuecomment-642231146>, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXONV3QAZJZOUDQUDNQGP3RV7SDBANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-643201881, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV6DQG74THVF4ZNWXW3RWIAEPANCNFSM4NQR5T7A

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-645473805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONVY6IPFGRYBRFHATYU3RXDVVFANCNFSM4NQR5T7A .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

gabor1 commented 4 years ago

can I close this?

bpfrd commented 4 years ago

yes.

On Thu, Aug 6, 2020 at 12:47 AM gabor1 notifications@github.com wrote:

can I close this?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-669479648, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONVYM72WPC3GGJFIBEJDR7G44TANCNFSM4NQR5T7A .

bpfrd commented 4 years ago

I see in your paper that you use the deterministic variant of CUR decomposition. Is this statement correct that "the deterministic CUR is slower but more accurate than stochastic CUR"? What is the advantage of deterministic CUR over stochastic one? Best regards Behnam

On Thu, Aug 6, 2020 at 5:42 PM gabor1 notifications@github.com wrote:

Closed #208 https://github.com/libAtoms/QUIP/issues/208.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#event-3628977782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV36EMI7OYMSQZMAAITR7KT3RANCNFSM4NQR5T7A .

gabor1 commented 4 years ago

are you talking about this paper: https://arxiv.org/pdf/1804.02150.pdf ?

-- Gábor

On 18 Aug 2020, at 19:25, bpfrd notifications@github.com wrote:

I see in your paper that you use the deterministic variant of CUR decomposition. Is this statement correct that "the deterministic CUR is slower but more accurate than stochastic CUR"? What is the advantage of deterministic CUR over stochastic one? Best regards Behnam

On Thu, Aug 6, 2020 at 5:42 PM gabor1 notifications@github.com wrote:

Closed #208 https://github.com/libAtoms/QUIP/issues/208.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#event-3628977782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV36EMI7OYMSQZMAAITR7KT3RANCNFSM4NQR5T7A .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

bpfrd commented 4 years ago

No, I was talking about this paper: https://arxiv.org/pdf/1901.10971.pdf Best regards Behnam

On Wed, Aug 19, 2020 at 1:52 AM gabor1 notifications@github.com wrote:

are you talking about this paper: https://arxiv.org/pdf/1804.02150.pdf ?

-- Gábor

On 18 Aug 2020, at 19:25, bpfrd notifications@github.com wrote:

I see in your paper that you use the deterministic variant of CUR decomposition. Is this statement correct that "the deterministic CUR is slower but more accurate than stochastic CUR"? What is the advantage of deterministic CUR over stochastic one? Best regards Behnam

On Thu, Aug 6, 2020 at 5:42 PM gabor1 notifications@github.com wrote:

Closed #208 https://github.com/libAtoms/QUIP/issues/208.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#event-3628977782, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV36EMI7OYMSQZMAAITR7KT3RANCNFSM4NQR5T7A

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-675727019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV4E6YE474PPOUOYPZLSBLWHRANCNFSM4NQR5T7A .

gabor1 commented 4 years ago

I don’t really have a strong opinion on deterministic or stochastic either way. in the gap_fit code we use the stochastic version. I know that Michele Ceriotti’s group uses the deterministic version.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 18 Aug 2020, at 23:02, bpfrd notifications@github.com wrote:

No, I was talking about this paper: https://arxiv.org/pdf/1901.10971.pdf Best regards Behnam

On Wed, Aug 19, 2020 at 1:52 AM gabor1 notifications@github.com wrote:

are you talking about this paper: https://arxiv.org/pdf/1804.02150.pdf ?

-- Gábor

On 18 Aug 2020, at 19:25, bpfrd notifications@github.com wrote:

I see in your paper that you use the deterministic variant of CUR decomposition. Is this statement correct that "the deterministic CUR is slower but more accurate than stochastic CUR"? What is the advantage of deterministic CUR over stochastic one? Best regards Behnam

On Thu, Aug 6, 2020 at 5:42 PM gabor1 notifications@github.com wrote:

Closed #208 https://github.com/libAtoms/QUIP/issues/208.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#event-3628977782, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXONV36EMI7OYMSQZMAAITR7KT3RANCNFSM4NQR5T7A

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/libAtoms/QUIP/issues/208#issuecomment-675727019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXONV4E6YE474PPOUOYPZLSBLWHRANCNFSM4NQR5T7A .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.