libAtoms / QUIP

libAtoms/QUIP molecular dynamics framework: https://libatoms.github.io
348 stars 122 forks source link

Beginner question -- energy in training XYZ #248

Open mike-zz opened 4 years ago

mike-zz commented 4 years ago

Hi everyone,

I followed instructions in the tutorial https://libatoms.github.io/GAP/gap_fitting_tutorial.html everything worked fine.

In the last column of the train.xyz file, if I am not wrong it is the interaction energy between an atom and the rest of the system. So then, how do we get it from DFT calculations?

Here is the first few lines in one of the train.xyz files I tried.

Lattice="9.312844978309194 0.0 0.0 0.0 9.312844978309194 0.0 0.0 0.0 9.312844978309194" Properties=species:S:1:pos:R:3:forces:R:3:energies:R:1 energy=32.18324799602758 pbc="T T T"
O        0.22563639       0.46002934       8.46625561      -1.97258821       0.11591430       0.81118587       0.15273914
H        3.46958566       7.58733309       0.19283273      -1.15026494      -0.18143450      -0.14036099       1.10531707
H        6.85779055       7.03169669       2.74381603       0.03128743       0.60371032       0.07879595       0.34787096
..............

Thank you in advance, Mike

gabor1 commented 4 years ago

as you say, atomic energies are not something that electronic structure calculations provide, and GAP doesn't use them. it uses the total energy, the forces and the stress (virial actually, which is -stress*volume). It doesn't matter if there is an "energies" column in your xyz file, or in fact any other column, only the above three pieces of data are used (and the positions of course)

-- Gábor

On 4 Nov 2020, at 23:21, mike-zz notifications@github.com wrote:

 Hi everyone,

I followed instructions in the tutorial https://libatoms.github.io/GAP/gap_fitting_tutorial.html everything worked fine.

In the last column of the train.xyz file, if I am not wrong it is the interaction energy between an atom and the rest of the system. So then, how do we get it from DFT calculations?

Here is the first few lines in one of the train.xyz files I tried.

Lattice="9.312844978309194 0.0 0.0 0.0 9.312844978309194 0.0 0.0 0.0 9.312844978309194" Properties=species:S:1:pos:R:3:forces:R:3:energies:R:1 energy=32.18324799602758 pbc="T T T" O 0.22563639 0.46002934 8.46625561 -1.97258821 0.11591430 0.81118587 0.15273914 H 3.46958566 7.58733309 0.19283273 -1.15026494 -0.18143450 -0.14036099 1.10531707 H 6.85779055 7.03169669 2.74381603 0.03128743 0.60371032 0.07879595 0.34787096 ..............

Thank you in advance, Mike

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 4 years ago

Thanks Gabor for your prompt response. Now, I am trying a system of 5 CO molecules, in the train.xyz file, I don't include the energy column, here is part of the file

   10
energy=-2947.8929646  pbc="T T T" Lattice="  20.00000     0.00000     0.00000     0.00000    20.00000     0.00000     0.00000     0.00000    20.00000" Properties=species:S:1:pos:R:3:forces:R:3
 C       10.1989669164    10.1553475786    10.6458928947  -0.0002586288     0.0000259572     0.0007552455
 O        9.8173927303    10.1246831229    11.7303085060   0.0003283609     0.0001751902    -0.0011506024
 C       10.2635803891    10.1105208316    14.5513843204   0.0004770144     0.0006077985    -0.0025403596
 O       10.0991624261     9.8735808373    15.6402980654  -0.0005782441    -0.0007160191     0.0026639640
 C       10.2746876775     7.1498445424    10.2952826687  -0.0001513481    -0.0002337130     0.0006137529
 O       10.0031010112     6.8333607527    11.3664165564   0.0002058801     0.0000820104    -0.0005905067
 C        7.1431841572     7.1247312742    10.2287591961   0.0002417069    -0.0000307047    -0.0007744168
 O        6.6994558855     7.1898900581    11.2760760948  -0.0004001187     0.0001146861    -0.0000619061
 C        7.3260054924     6.8359451820    13.7579382104   0.0000377434    -0.0001157271     0.0011517107
 O        6.7251050822     6.8184026795    14.7372309734   0.0001314678    -0.0001226373    -0.0004144085
     1
energy=-144.3245392 free_energy=-144.3245392  pbc="T T T" Lattice="  20.00000     0.00000     0.00000     0.00000    20.00000     0.00000     0.00000     0.00000    20.00000" Properties=species:S:1:pos:R:3:forces:R:3
 C 0.00 0.00 0.00 0.00 0.00 0.00
     1
energy=-428.5753784 free_energy=-428.5753784  pbc="T T T" Lattice="  20.00000     0.00000     0.00000     0.00000    20.00000     0.00000     0.00000     0.00000    20.00000" Properties=species:S:1:pos:R:3:forces:R:3
 O 0.00 0.00 0.00 0.00 0.00 0.00

then I use the command

../gap_fit energy_parameter_name=dft_energy force_parameter_name=forces do_copy_at_file=F sparse_separate_file=T gp_file=GAP.xml at_file=train.xyz default_sigma={0.008 0.04 0 0} gap={distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=10}

there is an error

Atom species 6 present in teaching XYZ, but not found corresponding isolated representative
 Atom species 8 present in teaching XYZ, but not found corresponding isolated representative
 SYSTEM ABORT: Determination of e0 was requested to be based on isolated atom energies, but not all  atom types
 present in the XYZ had an isolated representative.

I think this issue is addressed in a previous thread, but I am not sure how to fix/use e0. Do you have any suggestion?

Mike

gabor1 commented 4 years ago

The isolated atoms are present in your training file, but their energies are not picked up because you have an error in your command line. You are telling gap that the energy will be indicated by the key “dft_energy” in the xyz file when in fact it isn’t, you are calling it “energy”. Does this make sense?

-- Gábor

On 5 Nov 2020, at 00:05, mike-zz notifications@github.com wrote:

 Thanks Gabor for your prompt response. Now, I am trying a system of 5 CO molecules, in the train.xyz file, I don't include the energy column, here is part of the file

10 energy=-2947.8929646 pbc="T T T" Lattice=" 20.00000 0.00000 0.00000 0.00000 20.00000 0.00000 0.00000 0.00000 20.00000" Properties=species:S:1:pos:R:3:forces:R:3 C 10.1989669164 10.1553475786 10.6458928947 -0.0002586288 0.0000259572 0.0007552455 O 9.8173927303 10.1246831229 11.7303085060 0.0003283609 0.0001751902 -0.0011506024 C 10.2635803891 10.1105208316 14.5513843204 0.0004770144 0.0006077985 -0.0025403596 O 10.0991624261 9.8735808373 15.6402980654 -0.0005782441 -0.0007160191 0.0026639640 C 10.2746876775 7.1498445424 10.2952826687 -0.0001513481 -0.0002337130 0.0006137529 O 10.0031010112 6.8333607527 11.3664165564 0.0002058801 0.0000820104 -0.0005905067 C 7.1431841572 7.1247312742 10.2287591961 0.0002417069 -0.0000307047 -0.0007744168 O 6.6994558855 7.1898900581 11.2760760948 -0.0004001187 0.0001146861 -0.0000619061 C 7.3260054924 6.8359451820 13.7579382104 0.0000377434 -0.0001157271 0.0011517107 O 6.7251050822 6.8184026795 14.7372309734 0.0001314678 -0.0001226373 -0.0004144085 1 energy=-144.3245392 free_energy=-144.3245392 pbc="T T T" Lattice=" 20.00000 0.00000 0.00000 0.00000 20.00000 0.00000 0.00000 0.00000 20.00000" Properties=species:S:1:pos:R:3:forces:R:3 C 0.00 0.00 0.00 0.00 0.00 0.00 1 energy=-428.5753784 free_energy=-428.5753784 pbc="T T T" Lattice=" 20.00000 0.00000 0.00000 0.00000 20.00000 0.00000 0.00000 0.00000 20.00000" Properties=species:S:1:pos:R:3:forces:R:3 O 0.00 0.00 0.00 0.00 0.00 0.00 then I use the command

../gap_fit energy_parameter_name=dft_energy force_parameter_name=forces do_copy_at_file=F sparse_separate_file=T gp_file=GAP.xml at_file=train.xyz default_sigma={0.008 0.04 0 0} gap={distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=10}

there is an error

Atom species 6 present in teaching XYZ, but not found corresponding isolated representative Atom species 8 present in teaching XYZ, but not found corresponding isolated representative SYSTEM ABORT: Determination of e0 was requested to be based on isolated atom energies, but not all atom types present in the XYZ had an isolated representative. I think this issue is addressed in a previous thread, but I am not sure how to fix/use e0. Do you have any suggestion?

Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 4 years ago

You are right. Thanks a lot!

mike-zz commented 4 years ago

I have some more beginner questions.

gap_fit generates a GAP.xml file and some GAP.xml.sparseX files. For the system I am testing, 2b generates 3 GAP.xml.sparseX files, 2b+3b generates 9, and 2b+3b+soap generates 11:

  1. In each GAP.xml.sparseX* file, there is a column of numbers. What parameter is it in this paper https://arxiv.org/pdf/1502.01366.pdf ?

  2. In a given GAP approach (say, 2b+3b+soap), is there any parameter that controls the number of GAP.xml.sparseX* files?

  3. I tried some LAMMPS runs for a system of ~100 atoms, using these GAPs on 32 CPUs. The best performance was 0.22 ns/day. LAMMPS settings aside, I think there is something not quite right here. Do you have any comment/suggestion?

Thanks!

--Mike

gabor1 commented 4 years ago

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 11 Nov 2020, at 00:24, mike-zz notifications@github.com wrote:

I have some more beginner questions.

gap_fit generates a GAP.xml file and some GAP.xml.sparseX files. For the system I am testing, 2b generates 3 GAP.xml.sparseX files, 2b+3b generates 9, and 2b+3b+soap generates 11:

• In each GAP.xml.sparseX* file, there is a column of numbers. What parameter is it in this paper https://arxiv.org/pdf/1502.01366.pdf ?

the sparseX file contains the d_i vectors of Eq 9 of that paper

• In a given GAP approach (say, 2b+3b+soap), is there any parameter that controls the number of GAP.xml.sparseX* files?

The number of sparseX files equals the number of different descriptors, but with different element contributions separated out. So there is a separate sparseX file for each element pair for 2b, each element triplet for 3b, and for each element for soap (the element there is the center of the environment described by SOAP)

• I tried some LAMMPS runs for a system of ~100 atoms, using these GAPs on 32 CPUs. The best performance was 0.22 ns/day. LAMMPS settings aside, I think there is something not quite right here. Do you have any comment/suggestion?

the typical speed of a large GAP is 10-100 ms /atom/core. Your speed is 1002201000/86400/32 = 8 seconds /atom/core (assuming 1fs time step), which is indeed quite poor. The 3b can take a particularly long time if you chose a large cutoff, we recommend not using a 3b term, or keeping its cutoff very short. How many different elements do you have? Can you time a pure SOAP fit (not recommended for actual production runs, 2b is important because it can capture exchange repulsion very effectively) ?

Gabor

Thanks!

--Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 4 years ago

Thank you, Gabor. I have 2 different elements in the system. I will try a pure SOAP fit and let you know. Thanks again!

Mike

mike-zz commented 4 years ago

Hi Gabor,

I have tried SOAP and 2b+SOAP (~100 atom system, LAMMPS, 32 CPUs). The performance of SOAP is 1.267 ns/day, and that of 2b+SOAP is 0.164 ns/day. For the SOAP training, I used a pretty modest set (for my learning purpose) of parameters: l_max=4, n_max=4, and n_sparse=400.

Do you have any suggestion?

Thanks! Mike

gabor1 commented 4 years ago

can you show your complete gap_fit command line? the 2b really should not take much time at all. we typically use nmax=8 lmax=4 as "loose" settings.

-- Gábor

On 12 Nov 2020, at 00:54, mike-zz notifications@github.com wrote:

 Hi Gabor,

I have tried SOAP and 2b+SOAP (~100 atom system, LAMMPS, 32 CPUs). The performance of SOAP is 1.267 ns/day, and that of 2b+SOAP is 0.164 ns/day. For the SOAP training, I used a pretty modest set (for my learning purpose) of parameters: l_max=4, n_max=4, and n_sparse=400.

Do you have any suggestion?

Thanks! Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

gabor1 commented 4 years ago

what is your MD time step?

-- Gábor

On 12 Nov 2020, at 00:54, mike-zz notifications@github.com wrote:

 Hi Gabor,

I have tried SOAP and 2b+SOAP (~100 atom system, LAMMPS, 32 CPUs). The performance of SOAP is 1.267 ns/day, and that of 2b+SOAP is 0.164 ns/day. For the SOAP training, I used a pretty modest set (for my learning purpose) of parameters: l_max=4, n_max=4, and n_sparse=400.

Do you have any suggestion?

Thanks! Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 4 years ago

Here is the command line for 2b+SOAP (the same for SOAP with the 2b part removed)

gap_fit energy_parameter_name=dft_energy force_parameter_name=forces do_copy_at_file=F sparse_separate_file=T gp_file=GAP.xml at_file=train.xyz default_sigma={0.001 0.04 0 0} gap={distance_2b cutoff=5.5 covariance_type=ard_se delta=0.45 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=100 : soap cutoff=5.5 covariance_type=dot_product zeta=3 delta=0.016 atom_sigma=0.7 l_max=4 n_max=4 n_sparse=400 sparse_method=cur_points}

The MD time step was set at 1fs.

Another issue is that, for the SOAP parameters, with "high" zeta, l_max, n_max, n_sparse, it takes a very long time to train a system. Sometimes, there are memory problems. Is it possible to run it on multiplecores? How do we deal with the memory problems?

Thanks! Mike

bernstei commented 4 years ago

I have seen some very large times with long cutoff 2b.

                    Noam
gabor1 commented 4 years ago

5.5 is not "long" cutoff I think. But 100 Gaussians? That’s a bit of an overkill! I would try 10 or 15.

zeta doesn’t impact anything (speed or memory), but there is scaling with l_maxn_maxn_max, and also n_sparse. gap_fit is parallel with openmp, so if you have multiple cores, you can set OMP_NUM_THREADS= (having compiled with an arch that includes openmp!) and it should be nearly linearly scaling up to the entire machine. Memorywise, yes, you need a large memory machine for a bit training. for training a "big" potential, with 10K n_sparse, n_max=l_max=12, half a million energies/forces (number of scalars), you need over a TB of memory. the memory is roughly linear in n_sparse*n_data. time also goes up with cutoff because there are more cross terms in the forces.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 12 Nov 2020, at 01:15, mike-zz notifications@github.com wrote:

Here is the command line for 2b+SOAP (the same for SOAP with the 2b part removed)

gap_fit energy_parameter_name=dft_energy force_parameter_name=forces do_copy_at_file=F sparse_separate_file=T gp_file=GAP.xml at_file=train.xyz default_sigma={0.001 0.04 0 0} gap={distance_2b cutoff=5.5 covariance_type=ard_se delta=0.45 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=100 : soap cutoff=5.5 covariance_type=dot_product zeta=3 delta=0.016 atom_sigma=0.7 l_max=4 n_max=4 n_sparse=400 sparse_method=cur_points}

The MD time step was set at 1fs.

Another issue is that, for the SOAP parameters, with "high" zeta, l_max, n_max, n_sparse, it takes a very long time to train a system. Sometimes, there are memory problems. Is it possible to run it on multiplecores? How do we deal with the memory problems?

Thanks! Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 4 years ago

Some tests show that a longer cutoff gives a higher accuracy (RMSE of energy/atom). I suppose the cutoff is the one in eqn(24) in https://arxiv.org/pdf/1502.01366.pdf Is there any rule for the (2b) cutoff selection? Is the first minimum of g(r) of an atom pair a good choice? By "100 Gaussians", what parameter in my command line did you talk about?

Thanks! Mike

gabor1 commented 4 years ago

On 12 Nov 2020, at 22:54, mike-zz notifications@github.com wrote:

Some tests show that a longer cutoff gives a higher accuracy (RMSE of energy/atom). I suppose the cutoff is the one in eqn(24) in https://arxiv.org/pdf/1502.01366.pdf

yes

Is there any rule for the (2b) cutoff selection? Is the first minimum of g(r) of an atom pair a good choice?

depends on what physics you want to capture. for short range repulsion, you can keep it really just nearest neighbour. for capturing a bit of vdW and fixed-charge electrostatics, of course you want it longer, even 10A.

By "100 Gaussians", what parameter in my command line did you talk about?

n_sparse=100 in your distance_2b.

Are you doing water? that’s a bit tricky because you have extremely different intramolecular OH and intermolecular OH interactions. I have had a lot of success with a double-soap potential, here are the parameters: (it was fitting to Bingqing Cheng’s water dataset from her PNAS paper (I think))

at_file=dataset_1593_eVAng.xyz sparse_jitter=1e-10 gap={soap cutoff=3.0 cutoff_transition_width=0.5 n_max=8 l_max=8 delta=0.1 atom_sigma=0.3 zeta=4 n_sparse=1500 normalise=T sparse_method=cur_points add_species covariance_type=dot_product : soap cutoff=6.0 cutoff_transition_width=1.0 n_max=8 l_max=8 delta=0.1 atom_sigma=0.6 zeta=4 n_sparse=1500 normalise=T sparse_method=cur_points add_species covariance_type=dot_product} e0_method=average default_sigma={0.001 0.05 0.0 0.0} energy_parameter_name=TotEnergy do_copy_at_file=F sparse_separate_file=T gp_file=gp_soap_soap_sig_0.001_sp3k_1-1593.xml

Thanks! Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

mike-zz commented 4 years ago

Thank you! I am testing a CO gas. What is the reason why one can use a pretty low n_sparse for 2b and but needs to use a very high n_parse for SOAP?

--Mike

gabor1 commented 4 years ago

Because the dimensionality of the descriptor is 1D for a 2b potential (just the distance between the atoms), and many hundreds of dimensions for SOAP (it is a nearly full representation of the entire neighbourhood)

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 12 Nov 2020, at 23:59, mike-zz notifications@github.com wrote:

Thank you! I am testing a CO gas. What is the reason why one can use a pretty low n_sparse for 2b and but needs to use a very high n_parse for SOAP?

--Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 4 years ago

Got it. Thanks a lot!

--Mike

gabor1 commented 4 years ago

I think CO gas (ani molecular gas) will be similar. The CO bond within the molecule is totally different from the weaker intermolecular CO bond which only the soap can pick up. try the double soap.

-- Gábor

On 13 Nov 2020, at 00:17, mike-zz notifications@github.com wrote:

 Got it. Thanks a lot!

--Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 4 years ago

In the double soap, one is for bonded interactions and one is for non-bonded interactions, right? Should cutoff_transition_width be scaled with the cutoff?

There are 4 numbers in default_sigma, what parameter in this paper does the last number correspond to https://arxiv.org/pdf/1502.01366.pdf ? What are delta and atom_sigma?

--Mike

gabor1 commented 3 years ago

On 13 Nov 2020, at 00:41, mike-zz notifications@github.com wrote:

In the double soap, one is for bonded interactions and one is for non-bonded interactions, right?

broadly speaking - but there is no explicit distinction between bonded and non-bonded. We are just saying that nearby interactions contribute, we resolve them in more detail, and then further out we smear the atoms more, and so resolve them in less spatial detail, which is reflecting the fact that intermolecular interactions are smoother than e.g. exchange repulsion

Should cutoff_transition_width be scaled with the cutoff?

yes.

There are 4 (sometimes, 3) numbers in default_sigma, what are they in https://arxiv.org/pdf/1502.01366.pdf ? and also, delta and atom_sigma?

3 only for very old files, which need updating to 4. they are the regularisers for energy/atom, force, virial stress/atom and any hessian data (rarely used, none in any published potential). if you don’t have any given data type, just give 0. This is mentioned in Table 2 (with 3 values, this paper predates the Hessian training)

delta is the weight of a given descriptor. (eq 21 - there it’s called sigma_w. we used delta in eq 45 for the same thing, and also in later papers)

atom_sigma is the Gaussian smearing width of the atoms in forming the neighbour density. (eq 36)

--Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

mike-zz commented 3 years ago

Thank you!

Today I tried parallel QUIP on a different machine. The GAP version is 1598315093. When I did a 2b fit, it appeared to have this error

TIMER: gpFull_covarianceMatrix_sparse_Coordinate3         done in 36.159666000000001 cpu secs, 4.1360933771356940 wall clock secs.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x2aaaabf5d59f in ???
#1  0xb05b09 in ???
#2  0xa494dc in ???

Do you have any idea why?

--Mike

gabor1 commented 3 years ago

No. Does it occur on the same machine if you run it serial? If you package up the inputs I'm happy to try and reproduce it (and if I can, we can fix it)

-- Gábor

On 17 Nov 2020, at 04:04, mike-zz notifications@github.com wrote:

 Thank you!

Today I tried parallel QUIP on a different machine. The GAP version is 1598315093. When I did a 2b fit, it appeared to have this error

TIMER: gpFull_covarianceMatrix_sparse_Coordinate3 done in 36.159666000000001 cpu secs, 4.1360933771356940 wall clock secs.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

0 0x2aaaabf5d59f in ???

1 0xb05b09 in ???

2 0xa494dc in ???

Do you have any idea why?

--Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 3 years ago

Thank you for offering help. It was related to lapack/blas, and I fixed it.

Now I have an issue regarding the code's parallelization. I think I didn't compiled QUIP properly. For a 2b fit (of a 2-element system), 1 CPU generated 3 sparseX files, and if I used 2 CPUs they generated 6 sparseX files (and first 3 files are basically the same as the last 3 files). So I think the 2CPUs just did a serial fit twice.

here is how I compiled QUIP

module purge
module  load gcc/9.3.0
module load openmpi/4.0.2
export QUIP_ARCH=linux_x86_64_gfortran_openmp
make config 
..... enter lapack/blas link...hit enters...enter 'y' for GAP... hit enters..... 
make 

Do you have any suggestion?

--Mike

gabor1 commented 3 years ago

You are compiling the threaded version. ...openmpi arch is the MPI parallel version

-- Gábor

On 18 Nov 2020, at 01:38, mike-zz notifications@github.com wrote:

 Thank you for offering help. It was related to lapack/blas, and I fixed it.

Now I have an issue regarding the code's parallelization. I think I didn't compiled QUIP properly. For a 2b fit (of a 2-element system), 1 CPU generated 3 sparseX files, and if I used 2 CPUs they generated 6 sparseX files (and first 3 files are basically the same as the last 3 files). So I think the 2CPUs just did a serial fit twice.

here is how I compiled QUIP

module purge module load gcc/9.3.0 module load openmpi/4.0.2 export QUIP_ARCH=linux_x86_64_gfortran_openmp make config ..... enter lapack/blas link...hit enters...enter 'y' for GAP... hit enters..... make

Do you have any suggestion?

--Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 3 years ago

I have the same problem with QUIP_ARCH=linux_x86_64_gfortran_openmpi

If I understood correctly, we can still use multiple CPUs with openmp (settting OMP_NUM_THREADS equal, say, 8). So then, why did the 2CPUs print out 6 sparseX files?

--Mike

gabor1 commented 3 years ago

um.. so after compiling with the openmpi arch, did you run it with OMP_NUM_THREADS=1 and "mpirun -np 2 gap_fit … " ?

If you want to use openMP then you compile with an arch like …gfortran_openmp, then set OMP_NUM_THREADS=2, and do NOT run with mpirun… just normally "gap_fit … "

sorry if this is obvious, but your messages are a bit cryptic.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 18 Nov 2020, at 23:59, mike-zz notifications@github.com wrote:

I have the same problem with QUIP_ARCH=linux_x86_64_gfortran_openmpi

If I understood correctly, we can still use multiple CPUs with openmp (settting OMP_NUM_THREADS equal, say, 8). So then, why did the 2CPUs print out 6 sparseX files?

--Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 3 years ago

Yes, for openmpi I used that setting (OMP_NUM_THREADS=1 and "mpirun -np 2 gap_fit … ") .

For openmp, I made a mess... It seems OK now.

Thank you! --Mike

gabor1 commented 3 years ago

do you still have a problem with running MPI ?

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 19 Nov 2020, at 01:33, mike-zz notifications@github.com wrote:

Yes, for openmpi I used that setting (OMP_NUM_THREADS=1 and "mpirun -np 2 gap_fit … ") .

For openmp, I made a mess... It seems OK now.

Thank you! --Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 3 years ago

Yes, but probably I don't need the openmpi version now as gap_fit is parallel with openmp. (Probably, openmpi QUIP is better for LAMMPS simulations? I compiled serial QUIP with LAMMPS, is it one of the reasons why my LAMMPS simulations were so slow (see my post on 10/11)?)

How does gap_fit time depend on n_max, l_max? The setting OMP_NUM_THREADS= is probably an important input, is there any way to determine the "best" ?

Thanks! --Mike

gabor1 commented 3 years ago

Sorry I realised I might have accidentally mislead you. OpenMP works for both training and prediction, but MPI is only implemented for prediction, not for training. But with lammps (which is itself MPI), you want to compile serial, as you did, but of course you then want to run lammps in parallel!

Generally the scaling is n_maxn_maxl_max, but this doesn’t mean that you can take n_max to be small! We typically use l_max = 0.5 * n_max as a heuristic.

OMP_NUM_THREADS should be set to however many cores you have!

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 20 Nov 2020, at 01:39, mike-zz notifications@github.com wrote:

Yes, but probably I don't need the openmpi version now as gap_fit is parallel with openmp. (Probably, openmpi QUIP is better for LAMMPS simulations? I compiled serial QUIP with LAMMPS, is it one of the reasons why my LAMMPS simulations were so slow (see my post on 10/11)?)

How does gap_fit time depend on n_max, l_max? The setting OMP_NUM_THREADS= is probably an important input, is there any way to determine the "best" ?

Thanks! --Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

bernstei commented 3 years ago

On Nov 20, 2020, at 4:49 AM, gabor1 notifications@github.com wrote:

Sorry I realised I might have accidentally mislead you. OpenMP works for both training and prediction, but MPI is only implemented for prediction, not for training. But with lammps (which is itself MPI), you want to compile serial, as you did, but of course you then want to run lammps in parallel!

Just to clarify what Gabor is saying, for LAMMPS, you compile serial QUIP, and LAMMPS with MPI, and LAMMPS provides all the parallelism.

Generally the scaling is n_maxn_maxl_max, but this doesn’t mean that you can take n_max to be small! We typically use l_max = 0.5 * n_max as a heuristic.

or even n_max/3

OMP_NUM_THREADS should be set to however many cores you have!

Not necessarily on Intel quad CPU (e.g. 4x18 cores = 72 cores) - I've found far better speed with OMP_NUM_THREADS = ncores/2 for GAP fitting.

                            Noam

|| |U.S. NAVAL| |RESEARCH| LABORATORY

Noam Bernstein, Ph.D. Center for Materials Physics and Technology U.S. Naval Research Laboratory T +1 202 404 8628 F +1 202 404 7546 https://www.nrl.navy.mil https://www.nrl.navy.mil/

mike-zz commented 3 years ago

Thank you for your suggestions!

Just want to be sure I don't get anything wrong. In the train.xyz file, the correct units of the positions and forces are Angstrom and eV/Angstrom, respectively. Right?

Thanks! --Mike

gabor1 commented 3 years ago

yes, energies in eV (total energy, not energy per atom), forces in eV/A and virial stress in eV (not intensive stress, but virial=-stress*volume)

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 21 Nov 2020, at 21:18, mike-zz notifications@github.com wrote:

Thank you for your suggestions!

Just want to be sure I don't get anything wrong. In the train.xyz file, the correct units of the positions and forces are Angstrom and eV/Angstrom, respectively. Right?

Thanks! --Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 3 years ago

Thank you!

--Mike

mike-zz commented 3 years ago

if I have a GAP for system A-B (system of 2 components A and B), and another GAP for system A (system of 1 component A), can I use these 2 GAPs for system 2A-B (system of 2 components A and B but with higher A content compared to that in system A-B) in lammps?

Thanks! --Mike

gabor1 commented 3 years ago

I don’t think that makes sense. You can only use one GAP at a time. What you should do is train a new GAP with configurations of A-B and configurations of pure A. But you will still be "extrapolating" when you make the composition 2AB. The point is the following: with SOAP-GAP, the potential is a function of environment of atoms within the cutoff. The potential is good at interpolation, so if your predictions are for systems where each ENVIRONMENT is something that is near what was in the database, the model will be good. So in your case of a two-component molecular liquid, you need to ensure that the local environments span all the way from pure A to pure B. Then you are well covered. You can test your datasets for this. Take your A-B dataset, and make a histogram of A/B ratios in spheres around A radius=cutoff. Your histogram will peak around A/B=1/2, but does it reach down to 0 and up to 1 ? That depends on how many data configurations you have, and how big they are (because if they are big enough, then here are more fluctuations in the local environment, and you are more likely to see extreme ratios). If you add pure A, pure B, 2AB and 2BA, then each of those will show ratios centered around their nominal ratio, so you are more likely to cover the full range.

Let me know if this makes sense.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 2 Dec 2020, at 17:48, mike-zz notifications@github.com wrote:

if I have a GAP for system A-B (system of 2 components A and B), and another GAP for system A (system of 1 component A), can I use these 2 GAPs for system 2A-B (system of 2 components A and B but with higher A content compared to that in system A-B) in lammps?

Thanks! --Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 3 years ago

It took me a while to digest this :). Thanks!

I have another (naive) question: If I have a system with an impurity (only one impurity atom in the unit cell), why doesn't 2b fit work?

--Mike

gabor1 commented 3 years ago

because the 2b tempalte you are using is trying to generate a 2b term for each pair, but you don’t have a pair of impurities near one another. so the solution is to give a number of 2b terms explicitly with Z={x y} for each where x and y are the atomic numbers of the pair, and don’t include your impurities.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Dec 2020, at 04:38, mike-zz notifications@github.com wrote:

It took me a while to digest this :). Thanks!

I have another (naive) question: If I have a system with an impurity (only one impurity atom in the unit cell), why doesn't 2b fit work?

--Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 3 years ago

Is it possible to just exclude the impurity-impurity pair as we still need the pairs of the impurity and other atoms (assuming that one uses only 2b)?

--Mike

bernstei commented 3 years ago

On Dec 10, 2020, at 12:02 PM, mike-zz notifications@github.com<mailto:notifications@github.com> wrote:

Is it possible to just exclude the impurity-impurity pair as we still need the pairs of the impurity and other atoms (assuming that one uses only 2b)?

If you set add_species=F you can explicitly control which pairs of species will get 2b terms.

Noam

gabor1 commented 3 years ago

Indeed. In fact specifying Z={x y} where x and y are atomic numbers I think turns off add_species automatically. You can just copy/past the descriptor from the output of the gap_fit (because it tells you which elements were found and how it expanded the 2b term)

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Dec 2020, at 17:03, bernstei notifications@github.com wrote:

On Dec 10, 2020, at 12:02 PM, mike-zz notifications@github.com<mailto:notifications@github.com> wrote:

Is it possible to just exclude the impurity-impurity pair as we still need the pairs of the impurity and other atoms (assuming that one uses only 2b)?

If you set add_species=F you can explicitly control which pairs of species will get 2b terms.

Noam

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 3 years ago

Thanks! Is this the correct way

gap={distance_2b add_species=F Z={Z1 Z1} Z={Z1 Z2} }

in which Z1 is the atomic number of the regular atoms, and Z2 is that of the impurity?

--Mike

gabor1 commented 3 years ago

you only need the Z={} once.

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Dec 2020, at 18:01, mike-zz notifications@github.com wrote:

Thanks! Is this the correct way

gap={distance_2b add_species=F Z={Z1 Z1} Z={Z1 Z2} }

in which Z1 is the atomic number of the regular atoms, and Z2 is that of the impurity?

--Mike

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

bernstei commented 3 years ago

On Dec 10, 2020, at 1:08 PM, gabor1 notifications@github.com<mailto:notifications@github.com> wrote:

you only need the Z={} once.

Those were different Z pairs. You need multiple descriptors, separated by semicolons: gap={distance_2b add_species=F Z={Z1 Z1}; distance_2b add_species=F Z={Z1 Z2} }

Noam

gabor1 commented 3 years ago

oh yes sorry. each pair needs its own string and options (and you can adjust cutoff, gaussian width etc, which you might very well want to do, H-H is a quite different function from La-La .

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 10 Dec 2020, at 18:09, bernstei notifications@github.com wrote:

On Dec 10, 2020, at 1:08 PM, gabor1 notifications@github.com<mailto:notifications@github.com> wrote:

you only need the Z={} once.

Those were different Z pairs. You need multiple descriptors, separated by semicolons: gap={distance_2b add_species=F Z={Z1 Z1}; distance_2b add_species=F Z={Z1 Z2} }

Noam — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mike-zz commented 3 years ago

Thanks! I think it is elegant. Just a little detail: we use a colon to separate 2b and soap fits in gap={ distance_2b ... : soap ...}, but we need to use semicolons to separate 2b fits, do they (colon & semicolon) make any difference here?

bernstei commented 3 years ago

On Dec 10, 2020, at 1:25 PM, mike-zz notifications@github.com wrote: Thanks! I think it is elegant. Just a little detail: we use a colon to separate 2b and soap fits in gap={ distance_2b ... : soap ...}, but we need to use semicolons to separate 2b fits, do they (colon & semicolon) make any difference here?

No, I think I just made a mistake (I have scripts to do this, so I hardly ever look at the syntax carefully). It's colons everywhere.

mike-zz commented 3 years ago

Thank you!

--Mike

mike-zz commented 3 years ago

If I do a similar thing for BAC angles, is this the correct way gap={angle_3b add_species=F Z=Z_A Z1=Z_B Z2=Z_C} (Z_A = atomic number of A,...)?