Open qzhu2017 opened 4 years ago
qzhu@cms CSP_BO (master) $ python example_validate.py models/test.json database/PtHO.db
------Gaussian Process Regression------
Kernel: 0.925**2 *Dot(length=4.256) 1 energy (0.002) 104 forces (0.024)
load the GP model from models/test.json
Train Energy [ 1]: R2 0.9975 MAE 0.000 RMSE 0.000
Train Forces [ 312]: R2 0.9645 MAE 0.013 RMSE 0.018
1 E: -5.265 -> -5.265 F_MSE: 0.033
2 E: -5.265 -> -5.265 F_MSE: 0.033
3 E: -5.265 -> -5.265 F_MSE: 0.035
4 E: -5.265 -> -5.265 F_MSE: 0.034
5 E: -5.265 -> -5.265 F_MSE: 0.034
6 E: -5.265 -> -5.265 F_MSE: 0.034
7 E: -5.265 -> -5.265 F_MSE: 0.033
8 E: -5.265 -> -5.265 F_MSE: 0.033
9 E: -5.265 -> -5.265 F_MSE: 0.034
10 E: -5.265 -> -5.265 F_MSE: 0.034
11 E: -5.265 -> -5.265 F_MSE: 0.034
12 E: -5.265 -> -5.265 F_MSE: 0.034
13 E: -5.265 -> -5.265 F_MSE: 0.033
14 E: -5.265 -> -5.265 F_MSE: 0.033
15 E: -5.265 -> -5.265 F_MSE: 0.034
16 E: -5.265 -> -5.265 F_MSE: 0.034
17 E: -5.265 -> -5.265 F_MSE: 0.034
18 E: -5.265 -> -5.265 F_MSE: 0.034
19 E: -5.265 -> -5.265 F_MSE: 0.033
20 E: -5.265 -> -5.265 F_MSE: 0.033
21 E: -5.265 -> -5.265 F_MSE: 0.034
22 E: -5.265 -> -5.265 F_MSE: 0.034
23 E: -5.265 -> -5.265 F_MSE: 0.034
24 E: -5.265 -> -5.265 F_MSE: 0.034
25 E: -5.265 -> -5.265 F_MSE: 0.033
26 E: -5.265 -> -5.265 F_MSE: 0.033
27 E: -5.265 -> -5.265 F_MSE: 0.034
28 E: -5.265 -> -5.265 F_MSE: 0.034
29 E: -5.265 -> -5.265 F_MSE: 0.034
30 E: -5.265 -> -5.265 F_MSE: 0.034
31 E: -5.265 -> -5.265 F_MSE: 0.032
32 E: -5.265 -> -5.265 F_MSE: 0.033
33 E: -5.265 -> -5.265 F_MSE: 0.034
34 E: -5.265 -> -5.265 F_MSE: 0.033
35 E: -5.265 -> -5.265 F_MSE: 0.035
36 E: -5.265 -> -5.265 F_MSE: 0.034
37 E: -5.265 -> -5.265 F_MSE: 0.033
38 E: -5.265 -> -5.265 F_MSE: 0.034
39 E: -5.265 -> -5.265 F_MSE: 0.034
40 E: -5.265 -> -5.265 F_MSE: 0.034
41 E: -5.265 -> -5.265 F_MSE: 0.034
42 E: -5.265 -> -5.265 F_MSE: 0.034
43 E: -5.265 -> -5.265 F_MSE: 0.032
44 E: -5.265 -> -5.265 F_MSE: 0.033
45 E: -5.265 -> -5.265 F_MSE: 0.035
46 E: -5.265 -> -5.265 F_MSE: 0.033
47 E: -5.265 -> -5.265 F_MSE: 0.035
48 E: -5.265 -> -5.265 F_MSE: 0.034
49 E: -5.265 -> -5.265 F_MSE: 0.033
50 E: -5.265 -> -5.265 F_MSE: 0.034
51 E: -5.265 -> -5.265 F_MSE: 0.034
52 E: -5.265 -> -5.265 F_MSE: 0.033
53 E: -5.265 -> -5.265 F_MSE: 0.035
54 E: -5.265 -> -5.265 F_MSE: 0.034
55 E: -5.265 -> -5.265 F_MSE: 0.033
56 E: -5.265 -> -5.265 F_MSE: 0.033
57 E: -5.265 -> -5.265 F_MSE: 0.035
58 E: -5.265 -> -5.265 F_MSE: 0.033
59 E: -5.265 -> -5.265 F_MSE: 0.035
60 E: -5.265 -> -5.265 F_MSE: 0.034
61 E: -5.265 -> -5.265 F_MSE: 0.033
62 E: -5.265 -> -5.265 F_MSE: 0.034
63 E: -5.265 -> -5.265 F_MSE: 0.034
64 E: -5.265 -> -5.265 F_MSE: 0.033
65 E: -5.265 -> -5.265 F_MSE: 0.035
66 E: -5.265 -> -5.265 F_MSE: 0.034
67 E: -5.265 -> -5.265 F_MSE: 0.033
68 E: -5.265 -> -5.265 F_MSE: 0.033
69 E: -5.265 -> -5.265 F_MSE: 0.035
70 E: -5.265 -> -5.265 F_MSE: 0.033
71 E: -5.265 -> -5.265 F_MSE: 0.035
72 E: -5.265 -> -5.265 F_MSE: 0.034
73 E: -5.265 -> -5.265 F_MSE: 0.033
74 E: -5.265 -> -5.265 F_MSE: 0.034
75 E: -5.265 -> -5.265 F_MSE: 0.034
76 E: -5.265 -> -5.265 F_MSE: 0.033
77 E: -5.265 -> -5.265 F_MSE: 0.034
78 E: -5.265 -> -5.265 F_MSE: 0.034
79 E: -5.265 -> -5.265 F_MSE: 0.033
80 E: -5.265 -> -5.265 F_MSE: 0.033
81 E: -5.265 -> -5.265 F_MSE: 0.034
82 E: -5.265 -> -5.265 F_MSE: 0.033
83 E: -5.265 -> -5.265 F_MSE: 0.035
84 E: -5.265 -> -5.265 F_MSE: 0.034
85 E: -5.265 -> -5.265 F_MSE: 0.033
86 E: -5.265 -> -5.265 F_MSE: 0.033
87 E: -5.265 -> -5.265 F_MSE: 0.035
88 E: -5.265 -> -5.265 F_MSE: 0.033
89 E: -5.265 -> -5.265 F_MSE: 0.034
90 E: -5.265 -> -5.265 F_MSE: 0.034
91 E: -5.265 -> -5.265 F_MSE: 0.033
92 E: -5.265 -> -5.265 F_MSE: 0.033
93 E: -5.265 -> -5.265 F_MSE: 0.034
94 E: -5.265 -> -5.265 F_MSE: 0.033
95 E: -5.265 -> -5.265 F_MSE: 0.034
96 E: -5.265 -> -5.265 F_MSE: 0.034
97 E: -5.265 -> -5.265 F_MSE: 0.033
98 E: -5.265 -> -5.265 F_MSE: 0.034
99 E: -5.265 -> -5.265 F_MSE: 0.034
100 E: -5.265 -> -5.265 F_MSE: 0.034
Test Energy [ 100]: R2 0.5746 MAE 0.000 RMSE 0.000
Test Forces [57600]: R2 0.9087 MAE 0.020 RMSE 0.034
5326.568 seconds elapsed
save the figure to E.png
save the figure to F.png
The results are not bad. Just too slow Need to fix the #10 before getting back to this issue.
qzhu@cms CSP_BO (master) $ python example_validate.py models/PtHO.json database/PtHO.db
------Gaussian Process Regression------
Kernel: 0.925**2 *Dot(length=4.256) 1 energy (0.002) 104 forces (0.024)
load the GP model from models/PtHO.json
gpu
Train Energy [ 1]: R2 0.9975 MAE 0.000 RMSE 0.000
Train Forces [ 312]: R2 0.9645 MAE 0.013 RMSE 0.018
False
1 E: -5.265 -> -5.265 F_MSE: 0.033
2 E: -5.265 -> -5.265 F_MSE: 0.033
3 E: -5.265 -> -5.265 F_MSE: 0.035
4 E: -5.265 -> -5.265 F_MSE: 0.034
5 E: -5.265 -> -5.265 F_MSE: 0.034
Test Energy [ 5]: R2 0.9825 MAE 0.000 RMSE 0.000
Test Forces [2880]: R2 0.9087 MAE 0.020 RMSE 0.034
176.038 seconds elapsed
save the figure to E.png
save the figure to F.png
qzhu@cms CSP_BO (master) $ python example_validate.py models/PtHO.json database/PtHO.db
------Gaussian Process Regression------
Kernel: 0.925**2 *Dot(length=4.256) 1 energy (0.002) 104 forces (0.024)
load the GP model from models/PtHO.json
gpu
Train Energy [ 1]: R2 0.9975 MAE 0.000 RMSE 0.000
Train Forces [ 312]: R2 0.9645 MAE 0.013 RMSE 0.018
False
1 E: -5.265 -> -5.265 F_MSE: 0.033
2 E: -5.265 -> -5.265 F_MSE: 0.033
3 E: -5.265 -> -5.265 F_MSE: 0.035
4 E: -5.265 -> -5.265 F_MSE: 0.034
5 E: -5.265 -> -5.265 F_MSE: 0.034
Test Energy [ 5]: R2 0.9825 MAE 0.000 RMSE 0.000
Test Forces [2880]: R2 0.9087 MAE 0.020 RMSE 0.034
66.691 seconds elapsed
save the figure to E.png
save the figure to F.png
@yanxon Can you update the code and run the following command
$python example_sampling.py database/PtHO.db > log-PtHO &
This is a code to construct the GPR force model for the PtHO data. It will probably take a couple of hours. We will have a discussion on the results tomorrow.
@qzhu2017
I am running this now. I will update the results after it's done.
At some point, it will complain that cuda is running out of memory;
File "example_sampling.py", line 55, in <module>
model.fit()
File "/scratch/qzhu/github/CSP_BO/cspbo/gaussianprocess_ef.py", line 85, in fit
params, loss = self.optimize(obj_func, hyper_params, hyper_bounds)
File "/scratch/qzhu/github/CSP_BO/cspbo/gaussianprocess_ef.py", line 448, in optimize
jac=True, options={'maxiter': 10, 'ftol': 1e-3})
File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_minimize.py", line 618, in minimize
callback=callback, **options)
File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 308, in _minimize_lbfgsb
finite_diff_rel_step=finite_diff_rel_step)
File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 262, in _prepare_scalar_function
finite_diff_rel_step, bounds, epsilon=epsilon)
File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 76, in __init__
self._update_fun()
File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 166, in _update_fun
self._update_fun_impl()
File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 73, in update_fun
self.f = fun_wrapped(self.x)
File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 70, in fun_wrapped
return fun(x, *args)
File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 74, in __call__
self._compute_if_needed(x, *args)
File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 68, in _compute_if_needed
fg = self.fun(x, *args)
File "/scratch/qzhu/github/CSP_BO/cspbo/gaussianprocess_ef.py", line 68, in obj_func
params, eval_gradient=True, clone_kernel=False)
File "/scratch/qzhu/github/CSP_BO/cspbo/gaussianprocess_ef.py", line 406, in log_marginal_likelihood
K, K_gradient = kernel.k_total_with_grad(self.train_x)
File "/scratch/qzhu/github/CSP_BO/cspbo/Dot_mb.py", line 123, in k_total_with_grad
C_ff, C_ff_s, C_ff_l = self.kff_many(data1[key1], data2[key2], True, True)
File "/scratch/qzhu/github/CSP_BO/cspbo/Dot_mb.py", line 250, in kff_many
C[i] = K_ff(x1, x_all, dx1dr, dxdr_all, sigma2, sigma02, zeta, grad, mask, device=self.device)
File "/scratch/qzhu/github/CSP_BO/cspbo/Dot_mb.py", line 446, in K_ff
tmp = (dx1dr[:,None,:,None,:] * d2D_dx1dx2[:,:,:,:,None]).sum(axis=(2)) #ijlm
File "cupy/core/core.pyx", line 940, in cupy.core.core.ndarray.__mul__
File "cupy/core/_kernel.pyx", line 836, in cupy.core._kernel.ufunc.__call__
File "cupy/core/_kernel.pyx", line 340, in cupy.core._kernel._get_out_args
File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__
File "cupy/cuda/memory.pyx", line 518, in cupy.cuda.memory.alloc
File "cupy/cuda/memory.pyx", line 1085, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1106, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 934, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
File "cupy/cuda/memory.pyx", line 949, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
File "cupy/cuda/memory.pyx", line 697, in cupy.cuda.memory._try_malloc
cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 1775955456 bytes (total 9663106048 bytes)
A possible fix is to split the training data to a few parts
@qzhu2017
The results just came in. For me, it seems like the calculation stopped at step 1154 with:
Kernel: 50.000**2 *Dot(length=4.118) 4 energy (0.005) 154 forces (0.050)
I believe the computation exits because of cuda running out of memory as well.
Hi @qzhu2017
Let's say if I want to continue PtH2O calculation, do I just use this command?
python3 example_sampling.py models/test.json database/PtHO.db > log
@yanxon You can also modify the script to make sure that you don't start with structure 0.
@yanxon You can also modify the script to make sure that you don't start with structure 0.
I see. This is just modifying the range of for loop
.