qzhu2017 / CSP_BO

Crystal Structure Prediction with Bayesian Optimization
0 stars 1 forks source link

the Workflow of CSP with on-the-fly GPR #36

Open qzhu2017 opened 3 years ago

qzhu2017 commented 3 years ago
image

@yanxon , I am thinking of the above workflow for a realistic structure prediction based on the on-the-fly ML learning.

I am working on it now.

qzhu2017 commented 3 years ago

Regarding the BO selection

After each generation is done, rank the structures based on E/E_var If there are 50 structures, consider the first 25 sorted by energy, for each structure, if

qzhu2017 commented 3 years ago

@yanxon I just made some fixes and got the following output. Clearly, there is something wrong with the bo selection. Also, why do you just try some more popular acquisition functions?

qzhu@cms CSP_ON_THE_FLY (master) $ python csp.py 2>err.txt
  0   0 Si8    Fd-3m              -4.473[   0.045]  32.97   0.04
  0   1 Si8    P4mm               -4.013[   0.017]  26.42   0.15
  0   2 Si8    P6                 -3.795[   0.122]  27.83   0.27
  0   3 Si8    P-1               -17.862[   0.154]  10.00   0.59
  0   4 Si8    Pm-3m              -6.350[   0.009]  14.35   0.12
  0   5 Si8    P4/mmm             -6.818[   0.031]  16.52   0.16
  0   6 Si8    P2_12_12          -20.572[   0.107]   9.27   0.38
  0   7 Si8    P-6m2             -20.637[   0.088]   5.39   0.80
  0   8 Si8    I4_132             -9.513[   0.156]  14.43   0.09
  0   9 Si8    Cmme               -5.986[   0.032]  14.93   0.26
------Gaussian Process Regression------
Kernel: 10.025**2 *RBF(length=0.163) 13 energy (0.002) 1 forces (0.020)

Loss:       21.928 10.025  0.163  0.002 
Loss:   212572.031  9.826 10.000  0.002 
Loss:       21.910 10.025  0.165  0.002 
Struc    0 is picked:   -4.473[   0.045] -> DFT energy:   -4.562 eV/atom in   0.11 minutes
------Gaussian Process Regression------
Kernel: 10.025**2 *RBF(length=0.165) 14 energy (0.002) 1 forces (0.020)

Loss:       24.116 10.025  0.165  0.002 
Loss:   213495.041  9.872 10.000  0.002 
Loss:       24.105 10.025  0.166  0.002 
Struc    5 is picked:   -6.818[   0.031] -> DFT energy:   -4.646 eV/atom in   0.35 minutes
0 energy and 0 force points will be removed
  1   0 Si8    Fmmm              -17.992[   0.156]   9.29   0.22
  1   1 Si8    Pm-3m              -6.140[   0.008]  14.52   0.10
  1   2 Si8    P4_12_12           -9.515[   0.156]  14.24   0.33
  1   3 Si8    P4/mmm             -5.933[   0.157]  21.46   0.14
yanxon commented 3 years ago

@qzhu2017

I believe the GPR predict function is wrong: https://github.com/qzhu2017/CSP_BO/blob/bd42f83996be8ddf51c5381bbccae376747d1624/CSP_ON_THE_FLY/csp.py#L99

Please look at this print:

  0   0 Si8    Fm-3m             -33.215[   0.089]   3.39   0.74
  0   1 Si8    P-1                -8.481[   0.088]  18.62   0.35
  0   2 Si8    P4/nmm             -6.004[   0.158]  20.69   0.18
  0   3 Si8    Pm-3m              -3.853[   0.002]  30.05   0.04
  0   4 Si8    P-3m1             -13.016[   0.122]  24.73   0.32
  0   5 Si8    Pm-3m             -16.941[   0.072]   4.35   0.73
  0   6 Si8    P4/mmm             -6.808[   0.030]  16.65   0.16
  0   7 Si8    P-62m              -8.593[   0.087]  17.83   0.39
  0   8 Si8    Aem2              -14.111[   0.155]  10.57   0.34
  0   9 Si8    P6/m              -17.781[   0.127]   8.15   0.50

##### The output of the predict function #####
 MEAN :
[ 9.61721477 -4.03342387  0.06144399 -2.79338743  0.05972591 10.7512382
 -2.33564639 -2.83836347  0.23613284  1.19894331]

 COV: 
[[ 3.26238287e+01 -9.10718865e-01  6.31215754e-01  1.98279625e-02
   1.88620340e+00  2.61793839e+01  5.37950297e-01  2.11375443e+00
   6.89429162e+00  1.49797814e+01]
 [-9.10718865e-01  3.16343007e+01  9.38334892e-03  2.82088429e-02
  -2.58953778e-02 -7.68482919e-01  3.92704777e+00  1.23434156e+01
  -2.33296409e-01  1.32986462e-01]
 [ 6.31215754e-01  9.38334892e-03  1.02128291e+02 -3.18252580e-04
   1.42621114e+00  4.77218414e-01  3.11392945e-02  2.56575471e-01
   2.24734769e+01  1.07867609e+01]
 [ 1.98279625e-02  2.82088429e-02 -3.18252580e-04  1.21122775e-02
  -3.44314462e-03  1.79116625e-02  3.69137987e-02  1.97888068e-02
  -8.24397205e-03 -1.24045934e-02]
 [ 1.88620340e+00 -2.58953778e-02  1.42621114e+00 -3.44314462e-03
   6.08949558e+01  1.43827611e+00 -3.79100113e-02  6.03578224e-01
   1.08140283e+01  1.02676675e+01]
 [ 2.61793839e+01 -7.68482919e-01  4.77218414e-01  1.79116625e-02
   1.43827611e+00  2.13000972e+01  4.96307440e-01  1.62608123e+00
   5.23092325e+00  1.13592472e+01]
 [ 5.37950297e-01  3.92704777e+00  3.11392945e-02  3.69137987e-02
  -3.79100113e-02  4.96307440e-01  3.77066352e+00  4.35808920e+00
  -4.81239311e-02  5.59802411e-01]
 [ 2.11375443e+00  1.23434156e+01  2.56575471e-01  1.97888068e-02
   6.03578224e-01  1.62608123e+00  4.35808920e+00  3.10961262e+01
   1.56417394e+00  5.02571761e+00]
 [ 6.89429162e+00 -2.33296409e-01  2.24734769e+01 -8.24397205e-03
   1.08140283e+01  5.23092325e+00 -4.81239311e-02  1.56417394e+00
   9.89965805e+01  6.54796173e+01]
 [ 1.49797814e+01  1.32986462e-01  1.07867609e+01 -1.24045934e-02
   1.02676675e+01  1.13592472e+01  5.59802411e-01  5.02571761e+00
   6.54796173e+01  6.56236130e+01]]

BO Sampling:
 [16.36505287 -8.40947657  9.41068708 -2.8463476  -4.00344943 16.27422764
 -1.90551664 -4.25489791  6.22151493  8.32061537]
qzhu2017 commented 3 years ago

@yanxon The function of predict only returns the GP energy. But the total energy is the sum of GP energy + LJ energy. You need to add the return_cov to the predict_structure function. As we discussed yesterday, the collect_data function should be moved outside of the loop.

Also, I suggest you also try some simpler acquisition.

yanxon commented 3 years ago

@qzhu2017

I will work on the collect_data later and some other acquisition later.

You are right. The total energy is GP + LJ.

If the BO algorithm is correct, it will always pick the 0-th structure since it has the lowest energy.

  0   0 Si8    Fm-3m             -33.215[   0.089]   3.39   0.74

But in reality, it will more likely to pick the 1st structure which has the energy of `-4.03342387. We need to include the LJ potential so that the BO algorithm will more likely to pick the 0-th structure.

If we don't solve this energy issue, adding other acquisition function won't help.

yanxon commented 3 years ago

@qzhu2017

I fix the energy issue with GP + LJ. Now the selection is reasonable:

  0   0 Si8    I4_1/amd           -7.477[   0.021]  13.85   0.26
  0   1 Si8    P6_3/mmc           -4.604[   0.107]  24.76   0.14
  0   2 Si8    P6/mmm            -22.921[   0.141]   7.02   0.49
  0   3 Si8    Pm-3m              -6.328[   0.009]  14.64   0.29
  0   4 Si8    Pmc2_1             -9.059[   0.087]  13.29   0.50
 running on   16 total cores
 distrk:  each k-point on   16 cores,    1 groups
...

Struc    2 is picked:  -22.921[   0.141] -> DFT energy:   12.446 eV/atom in   0.64 minutes

I am still working on making the standard deviation and the covariance matrix. I believe the covariance matrix and the standard deviation are still inconsistent.

yanxon commented 3 years ago
  • [ ] add BO selection (needs a serious check)

I believe this time the BO selection is correct after fixes in 9fc1bda61b9655f8bf1b622d347c83c56bad53c7 and c770312520d0b42611610886d4cb68e0eaa25371.

Perhaps, we need to fix the standard deviation in #38. Please let me know what do you think.

yanxon commented 3 years ago

@qzhu2017

Now I will work on getting collect_data outside the loop. After that, I will try different acquisition functions.

yanxon commented 3 years ago

@qzhu2017

I got some results from CSP with BO for Si with the alpha=2

The structure is picked at the 19th generation:

 897  19   0 Si8    Im-3m              -4.999[   0.000]  14.23   0.34                                                  
 898  19   1 Si8    Cmmm               -6.445[   0.070]  16.70   0.41                                                  
 899  19   2 Si8    Pbam               -4.956[   0.064]  21.08   0.36                                                  
 900  19   3 Si8    Fd-3m              -5.696[   0.022]  22.09   0.08                                                  
 901  19   4 Si8    Pm-3m              -5.214[   0.000]  16.73   0.02                                                  
 902  19   5 Si8    I4_132             -3.250[   0.098]  25.00   0.05                                                  
 903  19   6 Si8    Pm-3m              -5.214[   0.000]  16.73   0.09                                                  
 904  19   7 Si8    Pm-3m              -5.214[   0.000]  16.73   0.10                                                  
 905  19   8 Si8    P6/mmm             -4.000[   0.229]  51.43   0.32                                                  
 906  19   9 Si8    P-1                -4.042[   0.120]  41.66   0.65                                                  
 907  19  10 Si8    P6/mmm             -2.244[   0.137]  21.16   0.08                                                  
 908  19  11 Si8    Pm-3m              -3.917[   0.001]  32.62   0.08                                                  
 909  19  12 Si8    P4/mmm             -2.839[   0.180]  27.80   0.16                                                  
 910  19  13 Si8    P4/mmm             -5.181[   0.037]  22.86   0.16                                                  
 911  19  14 Si8    P222               -5.240[   0.037]  13.19   0.90                                                  
 912  19  15 Si8    P-1                -6.362[   0.104]  44.44   0.24                                                  
 913  19  16 Si8    Pm-3m              -5.214[   0.000]  16.73   0.16                                                  
 914  19  17 Si8    Pm-3m              -5.214[   0.000]  16.73   0.10                                                  
 915  19  18 Si8    P4/mmm             -3.842[   0.131]  24.73   0.28                                                  
 916  19  19 Si8    Pm-3m              -5.214[   0.000]  16.73   0.15                                                  
 917  19  20 Si8    P6_322             -5.549[   0.069]  26.53   0.24                                                  
 918  19  21 Si8    P-1                -5.765[   0.041]  21.12   0.58                                                  
 919  19  22 Si8    P6/mmm             -5.733[   0.001]  14.71   0.10                                                  
 920  19  23 Si8    P6/mmm             -4.094[   0.166]  80.32   0.10   

Firstly, the algorithm picked 15th, 20th, and 13th structures. Then, it picked the 3rd structure. In my case, this will be after 46 single-point dft, excluding the initial database.

1032    3-th structure is picked from the BO selection
1033  running on   16 total cores
1034  distrk:  each k-point on   16 cores,    1 groups
1035  distr:  one band on    2 cores,    8 groups
1036  using from now: INCAR
1037  vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Jul 28 2017 17:12:03) complex
1038  POSCAR found :  1 types and       8 ions
1039  scaLAPACK will be used
1040  LDA part: xc-table for Pade appr. of Perdew
1041  generate k-points for:    6    6    6
1042  POSCAR, INCAR and KPOINTS ok, starting setup
1043  FFT: planning ...
1044  WAVECAR not read
1045  WARNING: random wavefunctions but no delay for mixing, default for NELMDL
1046  entering main loop
1047        N       E                     dE             d eps       ncg     rms          rms(c)
1048 DAV:   1     0.708267839324E+02    0.70827E+02   -0.11996E+04   960   0.109E+03
...
1058 DAV:  11    -0.431418740499E+02   -0.40764E-06   -0.77700E-06   784   0.145E-02
1059    1 F= -.43141874E+02 E0= -.43142258E+02  d E =0.115330E-02
1060 ------Gaussian Process Regression------
1061 Kernel: 22.048**2 *RBF(length=0.225) 55 energy (0.002) 43 forces (0.020)
1062 
1063 Loss:      279.182 22.048  0.225  0.002
1064 Loss:  1797530.663 20.906 10.000  0.002
1065 Loss:      279.099 22.048  0.226  0.002
1066    3-th structure: E/atom:   -5.393 eV/atom sg: Fd-3m

Again, it appears in the 23rd generation:

1806  23   0 Si8    I4/mmm             -5.038[   0.033]  26.76   0.34
1807  23   1 Si8    P4_2/mnm           -2.600[   0.133]  28.98   0.41
1808  23   2 Si8    P4_2/n             -5.055[   0.026]  21.26   0.51
1809  23   3 Si8    P6_3/m             -4.002[   0.001]   9.25   0.76
1810  23   4 Si8    P6/mmm             -6.855[   0.101]  86.21   0.19
1811  23   5 Si8    P4/mbm             -2.734[   0.041]  21.95   0.34
1812  23   6 Si8    Cccm               -5.225[   0.090]  34.10   0.17
1813  23   7 Si8    Pm-3m              -5.207[   0.000]  16.73   0.10
1814  23   8 Si8    Pmma               -3.636[   0.151]  61.86   0.36
1815  23   9 Si8    P6_3/mmc           -5.678[   0.034]  12.89   0.44
1816  23  10 Si8    P4_2/mmc           -4.374[   0.001]  14.39   0.57
1817  23  11 Si8    I4_1/amd           -3.278[   0.001]   9.65   0.49
1818  23  12 Si8    P-6m2              -4.281[   0.038]  39.45   0.27
1819  23  13 Si8    Im-3m              -4.998[   0.000]  14.22   0.26
1820  23  14 Si8    Pmmm               -5.968[   0.101]  21.79   0.20
1821  23  15 Si8    P-1                -2.322[   0.036]  29.38   0.41
1822  23  16 Si8    I4/mmm             -5.269[   0.001]  14.36   0.24
1823  23  17 Si8    I4_1/amd           -5.626[   0.024]  18.88   0.29
1824  23  18 Si8    Pm-3m              -5.207[   0.000]  16.73   0.46
1825  23  19 Si8    Fd-3m              -5.455[   0.001]  21.08   0.11
1826  23  20 Si8    Pm-3m              -5.207[   0.000]  16.73   0.08
1827  23  21 Si8    P6/mmm             -5.489[   0.029]  16.83   0.40
1828  23  22 Si8    P1                 -3.178[   0.083]  38.43   0.47
1829  23  23 Si8    P-1                -4.672[   0.214]  31.13   0.49
1830  23  24 Si8    P4_2/mmc           -5.737[   0.056]  37.29   0.26
1831  23  25 Si8    Pm-3m              -5.207[   0.000]  16.73   0.12
1832  23  26 Si8    P4_2/nmc           -5.320[   0.327]  39.87   0.31
1833  23  27 Si8    Immm               -1.872[   0.086]  18.77   0.40

After single-point of structures 8th, 14th, 9th, and 17th, Fd-3m is picked. Now the GPR give a better energy description:

2014   19-th structure is picked from the BO selection
2015  running on   16 total cores
2016  distrk:  each k-point on   16 cores,    1 groups
2017  distr:  one band on    2 cores,    8 groups
2018  using from now: INCAR
2019  vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Jul 28 2017 17:12:03) complex
2020  POSCAR found :  1 types and       8 ions
2021  scaLAPACK will be used
2022  LDA part: xc-table for Pade appr. of Perdew
2023  generate k-points for:    6    6    6
2024  POSCAR, INCAR and KPOINTS ok, starting setup
2025  FFT: planning ...
2026  WAVECAR not read
2027  WARNING: random wavefunctions but no delay for mixing, default for NELMDL
2028  entering main loop
2029        N       E                     dE             d eps       ncg     rms          rms(c)
2030 DAV:   1     0.699372618464E+02    0.69937E+02   -0.12078E+04   960   0.110E+03
...
2040 DAV:  11    -0.433574012902E+02   -0.85334E-06   -0.90574E-06   784   0.154E-02
2041    1 F= -.43357401E+02 E0= -.43357773E+02  d E =0.111427E-02
2042 ------Gaussian Process Regression------
2043 Kernel: 22.045**2 *RBF(length=0.248) 75 energy (0.002) 59 forces (0.020)
2044 
2045 Loss:      391.044 22.045  0.248  0.002
2046 Loss:  2785916.424 20.503 10.000  0.002
2047 Loss:      391.024 22.045  0.248  0.002
2048   19-th structure: E/atom:   -5.420 eV/atom sg: Fd-3m

At last, I believe the GPR predict the Fd-3m consistently at energy of -5.428 eV: 39 9 Si8 Fd-3m -5.428[ 0.000] 20.65 0.21

qzhu2017 commented 3 years ago

2021/01/10

@yanxon I think the results are much better. However, there needs an improvement that the structures with zero sigma should not be selected.

------Gaussian Process Regression------
Kernel: 22.296**2 *RBF(length=0.191) 240 energy (0.004) 190 forces (0.041)

 25   0 Si8    P6_3/mmc           -3.046[   0.238]  42.36   0.32
 25   1 Si8    Pm-3m              -5.058[   0.000]  17.44   0.39
skip the duplicate structures
 25   3 Si8    Cmmm               -3.649[   0.067]  35.55   0.39
 25   4 Si8    P-6c2              -4.220[   0.052]  32.00   1.00
 25   5 Si8    P-1                -2.293[   0.092]  42.90   1.16
 25   6 Si8    Fm-3m              -4.766[   0.000]  20.66   0.18
 25   7 Si8    P1                 -3.128[   0.024]  25.21   1.59
 25   8 Si8    P6/mmm             -4.650[   0.009]  27.16   0.31
 25   9 Si8    P4/mmm             -4.091[   0.049]  29.02   0.92
 25  10 Si8    P4/mmm             -3.365[   0.052]  29.49   0.55
 25  11 Si8    I4/mmm             -4.990[   0.000]  16.50   0.37
 25  12 Si8    P4/mmm             -4.201[   0.012]  18.38   0.63
 25  13 Si8    P1                 -3.681[   0.062]  44.76   0.85
 25  14 Si8    P-62m              -3.745[   0.098]  40.38   0.48
 25  15 Si8    P1                 -4.822[   0.046]  30.02   1.32
 25  16 Si8    I4/mmm             -3.960[   0.002]  29.25   0.31
 25  17 Si8    Fd-3m              -5.425[   0.000]  20.55   0.10
 25  18 Si8    Imm2               -4.507[   0.011]  24.97   0.93
skip the duplicate structures
 25  20 Si8    P222_1             -3.910[   0.010]  14.28   1.45
skip the duplicate structures
 25  22 Si8    Pm-3n              -4.551[   0.004]  22.17   0.25
 25  23 Si8    P4/mmm             -4.218[   0.000]  40.83   0.40
 25  24 Si8    P6_3/mmc           -4.365[   0.055]  23.97   1.05
 25  25 Si8    P-1                -4.315[   0.009]  23.71   1.22
 25  26 Si8    P4/mmm             -4.384[   0.008]  24.11   0.57
 25  27 Si8    P-6m2              -4.473[   0.018]  14.63   0.92
 25  28 Si8    I4/mmm            -10.528[   0.206]  22.44   0.79
skip the duplicate structures
 25  30 Si8    P-1                -4.927[   0.001]  16.56   1.53
 25  31 Si8    P1                 -4.210[   0.030]  34.01   1.11
 25  32 Si8    P1                 -4.569[   0.006]  24.05   1.45
 25  33 Si8    P6_3cm             -4.513[   0.007]  15.40   1.38
 25  34 Si8    Pmmm               -4.248[   0.061]  32.82   0.42
skip the duplicate structures
 25  36 Si8    Cmme               -3.260[   0.059]  57.25   0.44
 25  37 Si8    Pm-3m              -3.879[   0.000]  29.90   0.11
 25  38 Si8    P4/mmm             -4.848[   0.019]  52.75   0.41
 25  39 Si8    P1                 -3.410[   0.077]  54.91   0.71
 25  40 Si8    P6_3/mcm           -4.482[   0.007]  22.85   0.87
 25  41 Si8    P4_2/mmc           -4.707[   0.059]  24.66   0.36
 25  42 Si8    P6/mmm             -4.459[   0.078]  38.64   0.16
 25  43 Si8    Cmmm               -3.619[   0.104]  20.94   0.61
 25  44 Si8    Imma               -3.626[   0.162]  37.27   0.47
 25  45 Si8    P-1                -4.145[   0.023]  23.19   1.00
 25  46 Si8    Pmn2_1             -6.177[   0.031]  19.92   1.34
 25  47 Si8    I4/mmm             -1.618[   0.326]  39.29   0.33
 25  48 Si8    P4/mmm             -3.164[   0.021]  14.74   0.59
 25  49 Si8    P6mm               -3.638[   0.097]  43.86   1.35
Struc   25 is picked:  -10.528[   0.206] -> DFT energy:   -2.433 eV/atom in   0.28 minutes
Struc   41 is picked:   -6.177[   0.031] -> DFT energy:   -3.265 eV/atom in   0.55 minutes
**Struc   16 is picked:   -5.425[   0.000] -> DFT energy:   -5.425 eV/atom in   0.11 minutes**
Struc   36 is picked:   -4.707[   0.059] -> DFT energy:   -4.481 eV/atom in   0.15 minutes
Struc   37 is picked:   -4.459[   0.078] -> DFT energy:   -4.044 eV/atom in   0.26 minutes
Struc    1 is picked:   -5.058[   0.000] -> DFT energy:   -5.054 eV/atom in   0.20 minutes
Struc   10 is picked:   -4.990[   0.000] -> DFT energy:   -4.832 eV/atom in   0.16 minutes
Struc   26 is picked:   -4.927[   0.001] -> DFT energy:   -4.865 eV/atom in   2.46 minutes
Struc   33 is picked:   -4.848[   0.019] -> DFT energy:   -2.409 eV/atom in   0.44 minutes
yanxon commented 3 years ago

@qzhu2017

2021/01/10

@yanxon I think the results are much better. However, there needs an improvement that the structures with zero sigma should not be selected.

This is fixed in 8b123ca20372c23e930a24db8a59c0d7ed3d924a.

yanxon commented 3 years ago

@qzhu2017

collect_data is moved outside the loop (6ca1b06397d7a08e4743ee612519ae9ac7dbdf2f).

yanxon commented 3 years ago

@qzhu2017

Here I have a criterion to get the minimum energy (E_min in eV/atom) found so far by the GPR: https://github.com/qzhu2017/CSP_BO/blob/6ca1b06397d7a08e4743ee612519ae9ac7dbdf2f/CSP_ON_THE_FLY/csp.py#L438-L439

Initially, the E_min will be determined based on the initial database, which is around -5.003 eV/atom. E_min is required when probability of improvement or expected improvement is called by the BO.

I am thinking that whenever the BO hits new low, we should optimize the new lowest structure with VASP. Another route will be reoptimized immediately by GPR and do single-point optimization again.

yanxon commented 3 years ago

@qzhu2017

I encountered this error with VASP while running csp-multi.py:

775 Traceback (most recent call last):
776   File "csp-multi.py", line 480, in <module>
777     best_eng, best_forces = dft_run(best_struc, path=calc_folder)
778   File "csp-multi.py", line 231, in dft_run
779     eng = struc.get_potential_energy()
780   File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/atoms.py", line 733, in get_potential_energy
781     energy = self._calc.get_potential_energy(self)
782   File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/calculators/vasp/vasp.py", line 228, in get_potential_energy
783     self.update(atoms)
784   File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/calculators/vasp/vasp.py", line 82, in update
785     self.calculate(atoms)
786   File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/calculators/vasp/vasp.py", line 105, in calculate
787     self.run()
788   File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/calculators/vasp/vasp.py", line 176, in run
789     raise RuntimeError('Vasp exited with exit code: %d.  ' % exitcode)
790 RuntimeError: Vasp exited with exit code: 31744.
yanxon commented 3 years ago

Hi @qzhu2017,

This is not the way to increase 20% of volume: https://github.com/qzhu2017/CSP_BO/blob/59cf8e0ae835550143a4ae86cadb89d90db15edf/CSP_ON_THE_FLY/csp-multi.py#L374

For a cubic cell, it means that volume is increased by 72.8%:

a*1.2 * b*1.2 * c*1.2 = a*b*c * (1.2)**3 = V * 1.728

To increase by 20%, we need to set it to 1.063 instead of 1.2.

Do you think this will matter to the initial database?

yanxon commented 3 years ago

@qzhu2017

I just uploaded a script in 35b197abfd5035b3584c3e4d3ad624ccae46ca60.

You can run it with this command:

python example_cell_size.py models/Si.json database/Si_244_DFT.db

Results

GPR prediction of Si diamond structure before expansion 
    E: -5.415 eV/atom E_var:  0.0006907077 E_var_total:  0.0055256616 E0: -5.425 eV/atom F_MSE:  0.000 V:  20.44 A^3/atom

GPR prediction of diamond structure after expansion 
    E: -5.131 eV/atom E_var:  0.0077179254 E_var_total:  0.0617434035 V:  24.55 A^3/atom

GPR prediction of diamond structure (2,2,2) before expansion 
    E: -5.415 eV/atom E_var:  0.0000863347 E_var_total:  0.0055254196 E0: -5.425 V:  20.44 A^3/atom

GPR prediction of diamond structure (2,2,2) after expansion 
    E: -5.019 eV/atom E_var:  0.0012567027 E_var_total:  0.0804289713 V:  24.55 A^3/atom

If you look at the (1,1,1) diamond structure vs (2,2,2) diamond structure, they have the same energy. However, the std for (2,2,2) diamond structure is smaller.

In addition, the energies after the expansion are different despite the same volume.

qzhu2017 commented 3 years ago

@yanxon I just made a fix with a correct expansion. The results will look like this:

qzhu@cms examples (master) $ python example_cell_size.py models/Si.json database/Si_244_DFT.db
Processed 50 structures
------Gaussian Process Regression------
Kernel: 6.245**2 *RBF(length=0.561) 25 energy (0.008) 82 forces (0.163)

load the GP model from  models/Si.json
GPR prediction of diamond structure before expansion 
    E: -4.934 eV/atom E_var:  0.0048453197 E_var_total:  0.0387625579 V:  15.62 A^3/atom
GPR prediction of diamond structure after expansion 
    E: -5.361 eV/atom E_var:  0.0008895970 E_var_total:  0.0071167762 V:  18.77 A^3/atom
GPR prediction of diamond structure (2,2,2) before expansion 
    E: -4.934 eV/atom E_var:  0.0024226619 E_var_total:  0.0387625906 V:  15.62 A^3/atom
GPR prediction of diamond structure (2,2,2) after expansion 
    E: -5.361 eV/atom E_var:  0.0004448011 E_var_total:  0.0071168175 V:  18.77 A^3/atom

So the E_var is inconsistent but the E_var_total is consistent. I am expecting an opposite result. Can you look into this issue?

qzhu2017 commented 3 years ago

Needs to be selective when adding the force points. Currently, I am inclined to include the force points from low energy structures. This way, we don't need to compute too many Kff and Kefs, since they are considerably more expensive.

The current idea is to check how many force points can be neglected without losing too much accuracy. Needs some tests here.