Multi-body descriptor fails when only one atom of a type is present

gelzinyte commented 4 years ago

For example, I was trying to fit 2-body distance descriptor to methane dataset:

gap_fit atoms_filename=ch4.xyz sparse_separate_file=F gap={distance_2b cutoff=3.0 covariance_type=ard_se delta=2 n_sparse=10 theta_uniform=0.8 sparse_method=uniform add_species} default_sigma={0.01 0.1 0.0 0.0} sparse_jitter=1e-10

and calling quip with

quip E=T F=T atoms_filename=ch4.xyz param_filename=gap_new.xml

gave an error.

libAtoms::Hello World: Random Seed = 42979504
libAtoms::Hello World: global verbosity = 0

Calls to system_timer will do nothing by default

Using calc args:
Using pre-relax calc args:
Using param_filename: gap_new.xml
Using init args:
WARNING: Potential_initialise using default init_args "Potential xml_label=GAP_2020_3_31_0_11_55_46_127"
 ** On entry to DPOTRF parameter number  4 had an illegal value

Explicitly calling C-H and H-H two-body distance descriptors only works, though, with command:

ap_fit atoms_filename=ch4.xyz sparse_separate_file=F gap={distance_2b cutoff=3.0 covariance_type=ard_se delta=2 n_sparse=10 theta_uniform=0.8 sparse_method=uniform Z1=1 Z2=1:distance_2b cutoff=3.0 covariance_type=ard_se delta=2 n_sparse=10 theta_uniform=0.8 sparse_method=uniform Z1=1 Z2=6} default_sigma={0.01 0.1 0.0 0.0} sparse_jitter=1e-10

This ch4.xyz file only has only one CH4 configuration:

5
Properties=species:S:1:pos:R:3:momenta:R:3:force:R:3 energy=-87.6724937536763 free_energy=-87.6724937536763 pbc="F F F"
C        0.01058296       0.00116513      -0.00455365      -0.11788804       0.26384526       0.17658816       0.13768386       0.71645896       1.51950028
H        0.67374282       0.63491379       0.67264986      -0.08368281      -0.14181943      -0.21868352      -0.82127770      -0.67104046      -1.15775846
H       -0.68776892      -0.53128520       0.62488695       0.00110060      -0.15097780       0.08954817       0.18207016      -0.58119689       0.39921944
H        0.56243205      -0.66918947      -0.63086335       0.11738151      -0.06315389      -0.00400368       0.46436643      -0.40590210      -0.18573823
H       -0.67450901       0.55167763      -0.61241370       0.08308874       0.09210586      -0.04344914       0.03715725       0.94168048      -0.57522303
1
Lattice="20.0 0.0 0.0 0.0 20.0 0.0 0.0 0.0 20.0" Properties=species:S:1:pos:R:3:forces:R:3 energy=-6.492647588926119 free_energy=-6.492647588926119 pbc="T T T"
H        0.00000000       0.00000000       0.00000000       0.00000000       0.00000000       0.00000000
1
Lattice="20.0 0.0 0.0 0.0 20.0 0.0 0.0 0.0 20.0" Properties=species:S:1:pos:R:3:forces:R:3 energy=-38.054950833135265 free_energy=-38.054950833135265 pbc="T T T"
C        0.00000000       0.00000000       0.00000000       0.00000000       0.00000000       0.00000000

-Elena

bernstei commented 4 years ago

On Mar 31, 2020, at 8:31 AM, gelzinyte notifications@github.com wrote:

For example, I was trying to fit 2-body distance descriptor to methane dataset:

gap_fit atoms_filename=ch4.xyz sparse_separate_file=F gap={distance_2b cutoff=3.0 covariance_type=ard_se delta=2 n_sparse=10 theta_uniform=0.8 sparse_method=uniform add_species} default_sigma={0.01 0.1 0.0 0.0} sparse_jitter=1e-10

.n_sparse=10 but there aren't 10 different environments for each species in the fitting database, so it's perhaps unsurprising that it fails in a weird way. Have you tried with a more realistic database, say a few tens of perturbed configurations (with correct energies/forces etc)?

                            Noam

|| |U.S. NAVAL| |RESEARCH| LABORATORY Noam Bernstein, Ph.D. Center for Materials Physics and Technology U.S. Naval Research Laboratory T +1 202 404 8628 F +1 202 404 7546 https://www.nrl.navy.mil https://www.nrl.navy.mil/

gabor1 commented 4 years ago

n_sparse automatically gets reduced the to maximum available number. the problem occurs when that is zero. it fails with a lapack error

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 31 Mar 2020, at 13:56, bernstei notifications@github.com wrote:

On Mar 31, 2020, at 8:31 AM, gelzinyte notifications@github.com wrote:

For example, I was trying to fit 2-body distance descriptor to methane dataset:

gap_fit atoms_filename=ch4.xyz sparse_separate_file=F gap={distance_2b cutoff=3.0 covariance_type=ard_se delta=2 n_sparse=10 theta_uniform=0.8 sparse_method=uniform add_species} default_sigma={0.01 0.1 0.0 0.0} sparse_jitter=1e-10

.n_sparse=10 but there aren't 10 different environments for each species in the fitting database, so it's perhaps unsurprising that it fails in a weird way. Have you tried with a more realistic database, say a few tens of perturbed configurations (with correct energies/forces etc)?

Noam

|| |U.S. NAVAL| |RESEARCH| LABORATORY Noam Bernstein, Ph.D. Center for Materials Physics and Technology U.S. Naval Research Laboratory T +1 202 404 8628 F +1 202 404 7546 https://www.nrl.navy.mil https://www.nrl.navy.mil/ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

bernstei commented 4 years ago

On Mar 31, 2020, at 9:09 AM, gabor1 notifications@github.com wrote:

n_sparse automatically gets reduced the to maximum available number. the problem occurs when that is zero. it fails with a lapack error

So is it the lack of a C-C distance? Might be nice to test for that and either skip it or give a more meaningful error message.

                                    Noam

|| |U.S. NAVAL| |RESEARCH| LABORATORY Noam Bernstein, Ph.D. Center for Materials Physics and Technology U.S. Naval Research Laboratory T +1 202 404 8628 F +1 202 404 7546 https://www.nrl.navy.mil https://www.nrl.navy.mil/

gabor1 commented 4 years ago

yes. if we manually only include C-H and H-H only, it runs fine. the silly thing is that it only fails on evaluation. the training code is quite happy to create an xml file with no sparse points and an empty sparseX file

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 31 Mar 2020, at 14:11, bernstei notifications@github.com wrote:

On Mar 31, 2020, at 9:09 AM, gabor1 notifications@github.com wrote:

n_sparse automatically gets reduced the to maximum available number. the problem occurs when that is zero. it fails with a lapack error

So is it the lack of a C-C distance? Might be nice to test for that and either skip it or give a more meaningful error message.

Noam

|| |U.S. NAVAL| |RESEARCH| LABORATORY Noam Bernstein, Ph.D. Center for Materials Physics and Technology U.S. Naval Research Laboratory T +1 202 404 8628 F +1 202 404 7546 https://www.nrl.navy.mil https://www.nrl.navy.mil/ — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

hsulab commented 4 years ago

Is it possible to use distance_2b with different parameters for different bonds, namely 3.0 AA cutoff for C-C and 2.0 AA for H-H? And how to manually only include few bond types in the system, such as only C-H and H-H in CH4, by using gap_fit?

gabor1 commented 4 years ago

yes of course. just stop using "add_species=T", and give each pair descriptor explicitly. You can start by copying the descriptor strings that are reported in the output file when you did use the add_species=T command, and then you can modify them one by one, or omit the ones you don't need.

gabor1 commented 4 years ago

yes of course. to do this most easily, create a run with "add_species=T", which uses the same settings for all species. Then take a look at the output file of that training, which will list the descriptors created automatically as a result of the add_species, you will find all species pairs that are in your dataset, now separately specified. Copy those specifications into your gap string (instead of the pervious generic one), not how add_species is NOT part of these. Adjust the settings for each descriptor (one for each species pair) as desired.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Jul 2020, at 23:32, Jiayan Xu notifications@github.com wrote:

Is it possible to use distance_2b with different parameters for different bonds, namely 3.0 AA cutoff for C-C and 2.0 AA for H-H? And how to manually only include few bond types in the system, such as only C-H and H-H in CH4, by using gap_fit?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

libAtoms / QUIP

Multi-body descriptor fails when only one atom of a type is present #192