Open qzhu2017 opened 3 years ago
After each generation is done, rank the structures based on E/E_var If there are 50 structures, consider the first 25 sorted by energy, for each structure, if
@yanxon I just made some fixes and got the following output. Clearly, there is something wrong with the bo selection. Also, why do you just try some more popular acquisition functions?
qzhu@cms CSP_ON_THE_FLY (master) $ python csp.py 2>err.txt
0 0 Si8 Fd-3m -4.473[ 0.045] 32.97 0.04
0 1 Si8 P4mm -4.013[ 0.017] 26.42 0.15
0 2 Si8 P6 -3.795[ 0.122] 27.83 0.27
0 3 Si8 P-1 -17.862[ 0.154] 10.00 0.59
0 4 Si8 Pm-3m -6.350[ 0.009] 14.35 0.12
0 5 Si8 P4/mmm -6.818[ 0.031] 16.52 0.16
0 6 Si8 P2_12_12 -20.572[ 0.107] 9.27 0.38
0 7 Si8 P-6m2 -20.637[ 0.088] 5.39 0.80
0 8 Si8 I4_132 -9.513[ 0.156] 14.43 0.09
0 9 Si8 Cmme -5.986[ 0.032] 14.93 0.26
------Gaussian Process Regression------
Kernel: 10.025**2 *RBF(length=0.163) 13 energy (0.002) 1 forces (0.020)
Loss: 21.928 10.025 0.163 0.002
Loss: 212572.031 9.826 10.000 0.002
Loss: 21.910 10.025 0.165 0.002
Struc 0 is picked: -4.473[ 0.045] -> DFT energy: -4.562 eV/atom in 0.11 minutes
------Gaussian Process Regression------
Kernel: 10.025**2 *RBF(length=0.165) 14 energy (0.002) 1 forces (0.020)
Loss: 24.116 10.025 0.165 0.002
Loss: 213495.041 9.872 10.000 0.002
Loss: 24.105 10.025 0.166 0.002
Struc 5 is picked: -6.818[ 0.031] -> DFT energy: -4.646 eV/atom in 0.35 minutes
0 energy and 0 force points will be removed
1 0 Si8 Fmmm -17.992[ 0.156] 9.29 0.22
1 1 Si8 Pm-3m -6.140[ 0.008] 14.52 0.10
1 2 Si8 P4_12_12 -9.515[ 0.156] 14.24 0.33
1 3 Si8 P4/mmm -5.933[ 0.157] 21.46 0.14
@qzhu2017
I believe the GPR predict
function is wrong:
https://github.com/qzhu2017/CSP_BO/blob/bd42f83996be8ddf51c5381bbccae376747d1624/CSP_ON_THE_FLY/csp.py#L99
Please look at this print:
0 0 Si8 Fm-3m -33.215[ 0.089] 3.39 0.74
0 1 Si8 P-1 -8.481[ 0.088] 18.62 0.35
0 2 Si8 P4/nmm -6.004[ 0.158] 20.69 0.18
0 3 Si8 Pm-3m -3.853[ 0.002] 30.05 0.04
0 4 Si8 P-3m1 -13.016[ 0.122] 24.73 0.32
0 5 Si8 Pm-3m -16.941[ 0.072] 4.35 0.73
0 6 Si8 P4/mmm -6.808[ 0.030] 16.65 0.16
0 7 Si8 P-62m -8.593[ 0.087] 17.83 0.39
0 8 Si8 Aem2 -14.111[ 0.155] 10.57 0.34
0 9 Si8 P6/m -17.781[ 0.127] 8.15 0.50
##### The output of the predict function #####
MEAN :
[ 9.61721477 -4.03342387 0.06144399 -2.79338743 0.05972591 10.7512382
-2.33564639 -2.83836347 0.23613284 1.19894331]
COV:
[[ 3.26238287e+01 -9.10718865e-01 6.31215754e-01 1.98279625e-02
1.88620340e+00 2.61793839e+01 5.37950297e-01 2.11375443e+00
6.89429162e+00 1.49797814e+01]
[-9.10718865e-01 3.16343007e+01 9.38334892e-03 2.82088429e-02
-2.58953778e-02 -7.68482919e-01 3.92704777e+00 1.23434156e+01
-2.33296409e-01 1.32986462e-01]
[ 6.31215754e-01 9.38334892e-03 1.02128291e+02 -3.18252580e-04
1.42621114e+00 4.77218414e-01 3.11392945e-02 2.56575471e-01
2.24734769e+01 1.07867609e+01]
[ 1.98279625e-02 2.82088429e-02 -3.18252580e-04 1.21122775e-02
-3.44314462e-03 1.79116625e-02 3.69137987e-02 1.97888068e-02
-8.24397205e-03 -1.24045934e-02]
[ 1.88620340e+00 -2.58953778e-02 1.42621114e+00 -3.44314462e-03
6.08949558e+01 1.43827611e+00 -3.79100113e-02 6.03578224e-01
1.08140283e+01 1.02676675e+01]
[ 2.61793839e+01 -7.68482919e-01 4.77218414e-01 1.79116625e-02
1.43827611e+00 2.13000972e+01 4.96307440e-01 1.62608123e+00
5.23092325e+00 1.13592472e+01]
[ 5.37950297e-01 3.92704777e+00 3.11392945e-02 3.69137987e-02
-3.79100113e-02 4.96307440e-01 3.77066352e+00 4.35808920e+00
-4.81239311e-02 5.59802411e-01]
[ 2.11375443e+00 1.23434156e+01 2.56575471e-01 1.97888068e-02
6.03578224e-01 1.62608123e+00 4.35808920e+00 3.10961262e+01
1.56417394e+00 5.02571761e+00]
[ 6.89429162e+00 -2.33296409e-01 2.24734769e+01 -8.24397205e-03
1.08140283e+01 5.23092325e+00 -4.81239311e-02 1.56417394e+00
9.89965805e+01 6.54796173e+01]
[ 1.49797814e+01 1.32986462e-01 1.07867609e+01 -1.24045934e-02
1.02676675e+01 1.13592472e+01 5.59802411e-01 5.02571761e+00
6.54796173e+01 6.56236130e+01]]
BO Sampling:
[16.36505287 -8.40947657 9.41068708 -2.8463476 -4.00344943 16.27422764
-1.90551664 -4.25489791 6.22151493 8.32061537]
@yanxon
The function of predict
only returns the GP energy. But the total energy is the sum of GP energy + LJ energy. You need to add the return_cov to the predict_structure
function. As we discussed yesterday, the collect_data
function should be moved outside of the loop.
Also, I suggest you also try some simpler acquisition.
@qzhu2017
I will work on the collect_data
later and some other acquisition later.
You are right. The total energy is GP + LJ.
If the BO algorithm is correct, it will always pick the 0-th structure since it has the lowest energy.
0 0 Si8 Fm-3m -33.215[ 0.089] 3.39 0.74
But in reality, it will more likely to pick the 1st structure which has the energy of `-4.03342387. We need to include the LJ potential so that the BO algorithm will more likely to pick the 0-th structure.
If we don't solve this energy issue, adding other acquisition function won't help.
@qzhu2017
I fix the energy issue with GP + LJ. Now the selection is reasonable:
0 0 Si8 I4_1/amd -7.477[ 0.021] 13.85 0.26
0 1 Si8 P6_3/mmc -4.604[ 0.107] 24.76 0.14
0 2 Si8 P6/mmm -22.921[ 0.141] 7.02 0.49
0 3 Si8 Pm-3m -6.328[ 0.009] 14.64 0.29
0 4 Si8 Pmc2_1 -9.059[ 0.087] 13.29 0.50
running on 16 total cores
distrk: each k-point on 16 cores, 1 groups
...
Struc 2 is picked: -22.921[ 0.141] -> DFT energy: 12.446 eV/atom in 0.64 minutes
I am still working on making the standard deviation and the covariance matrix. I believe the covariance matrix and the standard deviation are still inconsistent.
- [ ] add BO selection (needs a serious check)
I believe this time the BO selection is correct after fixes in 9fc1bda61b9655f8bf1b622d347c83c56bad53c7 and c770312520d0b42611610886d4cb68e0eaa25371.
Perhaps, we need to fix the standard deviation in #38. Please let me know what do you think.
@qzhu2017
Now I will work on getting collect_data
outside the loop. After that, I will try different acquisition functions.
@qzhu2017
I got some results from CSP with BO for Si with the alpha=2
The structure is picked at the 19th generation:
897 19 0 Si8 Im-3m -4.999[ 0.000] 14.23 0.34
898 19 1 Si8 Cmmm -6.445[ 0.070] 16.70 0.41
899 19 2 Si8 Pbam -4.956[ 0.064] 21.08 0.36
900 19 3 Si8 Fd-3m -5.696[ 0.022] 22.09 0.08
901 19 4 Si8 Pm-3m -5.214[ 0.000] 16.73 0.02
902 19 5 Si8 I4_132 -3.250[ 0.098] 25.00 0.05
903 19 6 Si8 Pm-3m -5.214[ 0.000] 16.73 0.09
904 19 7 Si8 Pm-3m -5.214[ 0.000] 16.73 0.10
905 19 8 Si8 P6/mmm -4.000[ 0.229] 51.43 0.32
906 19 9 Si8 P-1 -4.042[ 0.120] 41.66 0.65
907 19 10 Si8 P6/mmm -2.244[ 0.137] 21.16 0.08
908 19 11 Si8 Pm-3m -3.917[ 0.001] 32.62 0.08
909 19 12 Si8 P4/mmm -2.839[ 0.180] 27.80 0.16
910 19 13 Si8 P4/mmm -5.181[ 0.037] 22.86 0.16
911 19 14 Si8 P222 -5.240[ 0.037] 13.19 0.90
912 19 15 Si8 P-1 -6.362[ 0.104] 44.44 0.24
913 19 16 Si8 Pm-3m -5.214[ 0.000] 16.73 0.16
914 19 17 Si8 Pm-3m -5.214[ 0.000] 16.73 0.10
915 19 18 Si8 P4/mmm -3.842[ 0.131] 24.73 0.28
916 19 19 Si8 Pm-3m -5.214[ 0.000] 16.73 0.15
917 19 20 Si8 P6_322 -5.549[ 0.069] 26.53 0.24
918 19 21 Si8 P-1 -5.765[ 0.041] 21.12 0.58
919 19 22 Si8 P6/mmm -5.733[ 0.001] 14.71 0.10
920 19 23 Si8 P6/mmm -4.094[ 0.166] 80.32 0.10
Firstly, the algorithm picked 15th, 20th, and 13th structures. Then, it picked the 3rd structure. In my case, this will be after 46 single-point dft, excluding the initial database.
1032 3-th structure is picked from the BO selection
1033 running on 16 total cores
1034 distrk: each k-point on 16 cores, 1 groups
1035 distr: one band on 2 cores, 8 groups
1036 using from now: INCAR
1037 vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Jul 28 2017 17:12:03) complex
1038 POSCAR found : 1 types and 8 ions
1039 scaLAPACK will be used
1040 LDA part: xc-table for Pade appr. of Perdew
1041 generate k-points for: 6 6 6
1042 POSCAR, INCAR and KPOINTS ok, starting setup
1043 FFT: planning ...
1044 WAVECAR not read
1045 WARNING: random wavefunctions but no delay for mixing, default for NELMDL
1046 entering main loop
1047 N E dE d eps ncg rms rms(c)
1048 DAV: 1 0.708267839324E+02 0.70827E+02 -0.11996E+04 960 0.109E+03
...
1058 DAV: 11 -0.431418740499E+02 -0.40764E-06 -0.77700E-06 784 0.145E-02
1059 1 F= -.43141874E+02 E0= -.43142258E+02 d E =0.115330E-02
1060 ------Gaussian Process Regression------
1061 Kernel: 22.048**2 *RBF(length=0.225) 55 energy (0.002) 43 forces (0.020)
1062
1063 Loss: 279.182 22.048 0.225 0.002
1064 Loss: 1797530.663 20.906 10.000 0.002
1065 Loss: 279.099 22.048 0.226 0.002
1066 3-th structure: E/atom: -5.393 eV/atom sg: Fd-3m
Again, it appears in the 23rd generation:
1806 23 0 Si8 I4/mmm -5.038[ 0.033] 26.76 0.34
1807 23 1 Si8 P4_2/mnm -2.600[ 0.133] 28.98 0.41
1808 23 2 Si8 P4_2/n -5.055[ 0.026] 21.26 0.51
1809 23 3 Si8 P6_3/m -4.002[ 0.001] 9.25 0.76
1810 23 4 Si8 P6/mmm -6.855[ 0.101] 86.21 0.19
1811 23 5 Si8 P4/mbm -2.734[ 0.041] 21.95 0.34
1812 23 6 Si8 Cccm -5.225[ 0.090] 34.10 0.17
1813 23 7 Si8 Pm-3m -5.207[ 0.000] 16.73 0.10
1814 23 8 Si8 Pmma -3.636[ 0.151] 61.86 0.36
1815 23 9 Si8 P6_3/mmc -5.678[ 0.034] 12.89 0.44
1816 23 10 Si8 P4_2/mmc -4.374[ 0.001] 14.39 0.57
1817 23 11 Si8 I4_1/amd -3.278[ 0.001] 9.65 0.49
1818 23 12 Si8 P-6m2 -4.281[ 0.038] 39.45 0.27
1819 23 13 Si8 Im-3m -4.998[ 0.000] 14.22 0.26
1820 23 14 Si8 Pmmm -5.968[ 0.101] 21.79 0.20
1821 23 15 Si8 P-1 -2.322[ 0.036] 29.38 0.41
1822 23 16 Si8 I4/mmm -5.269[ 0.001] 14.36 0.24
1823 23 17 Si8 I4_1/amd -5.626[ 0.024] 18.88 0.29
1824 23 18 Si8 Pm-3m -5.207[ 0.000] 16.73 0.46
1825 23 19 Si8 Fd-3m -5.455[ 0.001] 21.08 0.11
1826 23 20 Si8 Pm-3m -5.207[ 0.000] 16.73 0.08
1827 23 21 Si8 P6/mmm -5.489[ 0.029] 16.83 0.40
1828 23 22 Si8 P1 -3.178[ 0.083] 38.43 0.47
1829 23 23 Si8 P-1 -4.672[ 0.214] 31.13 0.49
1830 23 24 Si8 P4_2/mmc -5.737[ 0.056] 37.29 0.26
1831 23 25 Si8 Pm-3m -5.207[ 0.000] 16.73 0.12
1832 23 26 Si8 P4_2/nmc -5.320[ 0.327] 39.87 0.31
1833 23 27 Si8 Immm -1.872[ 0.086] 18.77 0.40
After single-point of structures 8th, 14th, 9th, and 17th, Fd-3m is picked. Now the GPR give a better energy description:
2014 19-th structure is picked from the BO selection
2015 running on 16 total cores
2016 distrk: each k-point on 16 cores, 1 groups
2017 distr: one band on 2 cores, 8 groups
2018 using from now: INCAR
2019 vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Jul 28 2017 17:12:03) complex
2020 POSCAR found : 1 types and 8 ions
2021 scaLAPACK will be used
2022 LDA part: xc-table for Pade appr. of Perdew
2023 generate k-points for: 6 6 6
2024 POSCAR, INCAR and KPOINTS ok, starting setup
2025 FFT: planning ...
2026 WAVECAR not read
2027 WARNING: random wavefunctions but no delay for mixing, default for NELMDL
2028 entering main loop
2029 N E dE d eps ncg rms rms(c)
2030 DAV: 1 0.699372618464E+02 0.69937E+02 -0.12078E+04 960 0.110E+03
...
2040 DAV: 11 -0.433574012902E+02 -0.85334E-06 -0.90574E-06 784 0.154E-02
2041 1 F= -.43357401E+02 E0= -.43357773E+02 d E =0.111427E-02
2042 ------Gaussian Process Regression------
2043 Kernel: 22.045**2 *RBF(length=0.248) 75 energy (0.002) 59 forces (0.020)
2044
2045 Loss: 391.044 22.045 0.248 0.002
2046 Loss: 2785916.424 20.503 10.000 0.002
2047 Loss: 391.024 22.045 0.248 0.002
2048 19-th structure: E/atom: -5.420 eV/atom sg: Fd-3m
At last, I believe the GPR predict the Fd-3m consistently at energy of -5.428 eV
:
39 9 Si8 Fd-3m -5.428[ 0.000] 20.65 0.21
@yanxon I think the results are much better. However, there needs an improvement that the structures with zero sigma should not be selected.
------Gaussian Process Regression------
Kernel: 22.296**2 *RBF(length=0.191) 240 energy (0.004) 190 forces (0.041)
25 0 Si8 P6_3/mmc -3.046[ 0.238] 42.36 0.32
25 1 Si8 Pm-3m -5.058[ 0.000] 17.44 0.39
skip the duplicate structures
25 3 Si8 Cmmm -3.649[ 0.067] 35.55 0.39
25 4 Si8 P-6c2 -4.220[ 0.052] 32.00 1.00
25 5 Si8 P-1 -2.293[ 0.092] 42.90 1.16
25 6 Si8 Fm-3m -4.766[ 0.000] 20.66 0.18
25 7 Si8 P1 -3.128[ 0.024] 25.21 1.59
25 8 Si8 P6/mmm -4.650[ 0.009] 27.16 0.31
25 9 Si8 P4/mmm -4.091[ 0.049] 29.02 0.92
25 10 Si8 P4/mmm -3.365[ 0.052] 29.49 0.55
25 11 Si8 I4/mmm -4.990[ 0.000] 16.50 0.37
25 12 Si8 P4/mmm -4.201[ 0.012] 18.38 0.63
25 13 Si8 P1 -3.681[ 0.062] 44.76 0.85
25 14 Si8 P-62m -3.745[ 0.098] 40.38 0.48
25 15 Si8 P1 -4.822[ 0.046] 30.02 1.32
25 16 Si8 I4/mmm -3.960[ 0.002] 29.25 0.31
25 17 Si8 Fd-3m -5.425[ 0.000] 20.55 0.10
25 18 Si8 Imm2 -4.507[ 0.011] 24.97 0.93
skip the duplicate structures
25 20 Si8 P222_1 -3.910[ 0.010] 14.28 1.45
skip the duplicate structures
25 22 Si8 Pm-3n -4.551[ 0.004] 22.17 0.25
25 23 Si8 P4/mmm -4.218[ 0.000] 40.83 0.40
25 24 Si8 P6_3/mmc -4.365[ 0.055] 23.97 1.05
25 25 Si8 P-1 -4.315[ 0.009] 23.71 1.22
25 26 Si8 P4/mmm -4.384[ 0.008] 24.11 0.57
25 27 Si8 P-6m2 -4.473[ 0.018] 14.63 0.92
25 28 Si8 I4/mmm -10.528[ 0.206] 22.44 0.79
skip the duplicate structures
25 30 Si8 P-1 -4.927[ 0.001] 16.56 1.53
25 31 Si8 P1 -4.210[ 0.030] 34.01 1.11
25 32 Si8 P1 -4.569[ 0.006] 24.05 1.45
25 33 Si8 P6_3cm -4.513[ 0.007] 15.40 1.38
25 34 Si8 Pmmm -4.248[ 0.061] 32.82 0.42
skip the duplicate structures
25 36 Si8 Cmme -3.260[ 0.059] 57.25 0.44
25 37 Si8 Pm-3m -3.879[ 0.000] 29.90 0.11
25 38 Si8 P4/mmm -4.848[ 0.019] 52.75 0.41
25 39 Si8 P1 -3.410[ 0.077] 54.91 0.71
25 40 Si8 P6_3/mcm -4.482[ 0.007] 22.85 0.87
25 41 Si8 P4_2/mmc -4.707[ 0.059] 24.66 0.36
25 42 Si8 P6/mmm -4.459[ 0.078] 38.64 0.16
25 43 Si8 Cmmm -3.619[ 0.104] 20.94 0.61
25 44 Si8 Imma -3.626[ 0.162] 37.27 0.47
25 45 Si8 P-1 -4.145[ 0.023] 23.19 1.00
25 46 Si8 Pmn2_1 -6.177[ 0.031] 19.92 1.34
25 47 Si8 I4/mmm -1.618[ 0.326] 39.29 0.33
25 48 Si8 P4/mmm -3.164[ 0.021] 14.74 0.59
25 49 Si8 P6mm -3.638[ 0.097] 43.86 1.35
Struc 25 is picked: -10.528[ 0.206] -> DFT energy: -2.433 eV/atom in 0.28 minutes
Struc 41 is picked: -6.177[ 0.031] -> DFT energy: -3.265 eV/atom in 0.55 minutes
**Struc 16 is picked: -5.425[ 0.000] -> DFT energy: -5.425 eV/atom in 0.11 minutes**
Struc 36 is picked: -4.707[ 0.059] -> DFT energy: -4.481 eV/atom in 0.15 minutes
Struc 37 is picked: -4.459[ 0.078] -> DFT energy: -4.044 eV/atom in 0.26 minutes
Struc 1 is picked: -5.058[ 0.000] -> DFT energy: -5.054 eV/atom in 0.20 minutes
Struc 10 is picked: -4.990[ 0.000] -> DFT energy: -4.832 eV/atom in 0.16 minutes
Struc 26 is picked: -4.927[ 0.001] -> DFT energy: -4.865 eV/atom in 2.46 minutes
Struc 33 is picked: -4.848[ 0.019] -> DFT energy: -2.409 eV/atom in 0.44 minutes
@qzhu2017
2021/01/10
@yanxon I think the results are much better. However, there needs an improvement that the structures with zero sigma should not be selected.
This is fixed in 8b123ca20372c23e930a24db8a59c0d7ed3d924a.
@qzhu2017
collect_data
is moved outside the loop (6ca1b06397d7a08e4743ee612519ae9ac7dbdf2f).
@qzhu2017
Here I have a criterion to get the minimum energy (E_min
in eV/atom) found so far by the GPR:
https://github.com/qzhu2017/CSP_BO/blob/6ca1b06397d7a08e4743ee612519ae9ac7dbdf2f/CSP_ON_THE_FLY/csp.py#L438-L439
Initially, the E_min
will be determined based on the initial database, which is around -5.003 eV/atom. E_min
is required when probability of improvement or expected improvement is called by the BO.
I am thinking that whenever the BO hits new low, we should optimize the new lowest structure with VASP. Another route will be reoptimized immediately by GPR and do single-point optimization again.
@qzhu2017
I encountered this error with VASP while running csp-multi.py
:
775 Traceback (most recent call last):
776 File "csp-multi.py", line 480, in <module>
777 best_eng, best_forces = dft_run(best_struc, path=calc_folder)
778 File "csp-multi.py", line 231, in dft_run
779 eng = struc.get_potential_energy()
780 File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/atoms.py", line 733, in get_potential_energy
781 energy = self._calc.get_potential_energy(self)
782 File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/calculators/vasp/vasp.py", line 228, in get_potential_energy
783 self.update(atoms)
784 File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/calculators/vasp/vasp.py", line 82, in update
785 self.calculate(atoms)
786 File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/calculators/vasp/vasp.py", line 105, in calculate
787 self.run()
788 File "/scratch/yanxonh/SOFTWARE/anaconda3/lib/python3.8/site-packages/ase-3.20.1-py3.8.egg/ase/calculators/vasp/vasp.py", line 176, in run
789 raise RuntimeError('Vasp exited with exit code: %d. ' % exitcode)
790 RuntimeError: Vasp exited with exit code: 31744.
Hi @qzhu2017,
This is not the way to increase 20% of volume: https://github.com/qzhu2017/CSP_BO/blob/59cf8e0ae835550143a4ae86cadb89d90db15edf/CSP_ON_THE_FLY/csp-multi.py#L374
For a cubic cell, it means that volume is increased by 72.8%
:
a*1.2 * b*1.2 * c*1.2 = a*b*c * (1.2)**3 = V * 1.728
To increase by 20%, we need to set it to 1.063
instead of 1.2
.
Do you think this will matter to the initial database?
@qzhu2017
I just uploaded a script in 35b197abfd5035b3584c3e4d3ad624ccae46ca60.
You can run it with this command:
python example_cell_size.py models/Si.json database/Si_244_DFT.db
GPR prediction of Si diamond structure before expansion
E: -5.415 eV/atom E_var: 0.0006907077 E_var_total: 0.0055256616 E0: -5.425 eV/atom F_MSE: 0.000 V: 20.44 A^3/atom
GPR prediction of diamond structure after expansion
E: -5.131 eV/atom E_var: 0.0077179254 E_var_total: 0.0617434035 V: 24.55 A^3/atom
GPR prediction of diamond structure (2,2,2) before expansion
E: -5.415 eV/atom E_var: 0.0000863347 E_var_total: 0.0055254196 E0: -5.425 V: 20.44 A^3/atom
GPR prediction of diamond structure (2,2,2) after expansion
E: -5.019 eV/atom E_var: 0.0012567027 E_var_total: 0.0804289713 V: 24.55 A^3/atom
If you look at the (1,1,1) diamond structure
vs (2,2,2) diamond structure
, they have the same energy. However, the std for (2,2,2) diamond structure
is smaller.
In addition, the energies after the expansion are different despite the same volume.
@yanxon I just made a fix with a correct expansion. The results will look like this:
qzhu@cms examples (master) $ python example_cell_size.py models/Si.json database/Si_244_DFT.db
Processed 50 structures
------Gaussian Process Regression------
Kernel: 6.245**2 *RBF(length=0.561) 25 energy (0.008) 82 forces (0.163)
load the GP model from models/Si.json
GPR prediction of diamond structure before expansion
E: -4.934 eV/atom E_var: 0.0048453197 E_var_total: 0.0387625579 V: 15.62 A^3/atom
GPR prediction of diamond structure after expansion
E: -5.361 eV/atom E_var: 0.0008895970 E_var_total: 0.0071167762 V: 18.77 A^3/atom
GPR prediction of diamond structure (2,2,2) before expansion
E: -4.934 eV/atom E_var: 0.0024226619 E_var_total: 0.0387625906 V: 15.62 A^3/atom
GPR prediction of diamond structure (2,2,2) after expansion
E: -5.361 eV/atom E_var: 0.0004448011 E_var_total: 0.0071168175 V: 18.77 A^3/atom
So the E_var
is inconsistent but the E_var_total
is consistent. I am expecting an opposite result. Can you look into this issue?
Needs to be selective when adding the force points. Currently, I am inclined to include the force points from low energy structures. This way, we don't need to compute too many Kff and Kefs, since they are considerably more expensive.
The current idea is to check how many force points can be neglected without losing too much accuracy. Needs some tests here.
@yanxon , I am thinking of the above workflow for a realistic structure prediction based on the on-the-fly ML learning.
I am working on it now.