To try Pt-H2O data - Githubissues

qzhu2017 commented 4 years ago

[x] copy the data ab5a869
[x] do sampling to see how many data will be needed to generate a reasonable force field 69032ff
[ ] After we get a reasonable FF, try to run MD

qzhu2017 commented 4 years ago

qzhu@cms CSP_BO (master) $ python example_validate.py models/test.json database/PtHO.db 
------Gaussian Process Regression------
Kernel: 0.925**2 *Dot(length=4.256) 1 energy (0.002) 104 forces (0.024)

load the GP model from  models/test.json
Train Energy [   1]: R2 0.9975 MAE  0.000 RMSE  0.000
Train Forces [ 312]: R2 0.9645 MAE  0.013 RMSE  0.018
   1 E: -5.265 -> -5.265  F_MSE:  0.033 
   2 E: -5.265 -> -5.265  F_MSE:  0.033 
   3 E: -5.265 -> -5.265  F_MSE:  0.035 
   4 E: -5.265 -> -5.265  F_MSE:  0.034 
   5 E: -5.265 -> -5.265  F_MSE:  0.034 
   6 E: -5.265 -> -5.265  F_MSE:  0.034 
   7 E: -5.265 -> -5.265  F_MSE:  0.033 
   8 E: -5.265 -> -5.265  F_MSE:  0.033 
   9 E: -5.265 -> -5.265  F_MSE:  0.034 
  10 E: -5.265 -> -5.265  F_MSE:  0.034 
  11 E: -5.265 -> -5.265  F_MSE:  0.034 
  12 E: -5.265 -> -5.265  F_MSE:  0.034 
  13 E: -5.265 -> -5.265  F_MSE:  0.033 
  14 E: -5.265 -> -5.265  F_MSE:  0.033 
  15 E: -5.265 -> -5.265  F_MSE:  0.034 
  16 E: -5.265 -> -5.265  F_MSE:  0.034 
  17 E: -5.265 -> -5.265  F_MSE:  0.034 
  18 E: -5.265 -> -5.265  F_MSE:  0.034 
  19 E: -5.265 -> -5.265  F_MSE:  0.033 
  20 E: -5.265 -> -5.265  F_MSE:  0.033 
  21 E: -5.265 -> -5.265  F_MSE:  0.034 
  22 E: -5.265 -> -5.265  F_MSE:  0.034 
  23 E: -5.265 -> -5.265  F_MSE:  0.034 
  24 E: -5.265 -> -5.265  F_MSE:  0.034 
  25 E: -5.265 -> -5.265  F_MSE:  0.033 
  26 E: -5.265 -> -5.265  F_MSE:  0.033 
  27 E: -5.265 -> -5.265  F_MSE:  0.034 
  28 E: -5.265 -> -5.265  F_MSE:  0.034 
  29 E: -5.265 -> -5.265  F_MSE:  0.034 
  30 E: -5.265 -> -5.265  F_MSE:  0.034 
  31 E: -5.265 -> -5.265  F_MSE:  0.032 
  32 E: -5.265 -> -5.265  F_MSE:  0.033 
  33 E: -5.265 -> -5.265  F_MSE:  0.034 
  34 E: -5.265 -> -5.265  F_MSE:  0.033 
  35 E: -5.265 -> -5.265  F_MSE:  0.035 
  36 E: -5.265 -> -5.265  F_MSE:  0.034 
  37 E: -5.265 -> -5.265  F_MSE:  0.033 
  38 E: -5.265 -> -5.265  F_MSE:  0.034 
  39 E: -5.265 -> -5.265  F_MSE:  0.034 
  40 E: -5.265 -> -5.265  F_MSE:  0.034 
  41 E: -5.265 -> -5.265  F_MSE:  0.034 
  42 E: -5.265 -> -5.265  F_MSE:  0.034 
  43 E: -5.265 -> -5.265  F_MSE:  0.032 
  44 E: -5.265 -> -5.265  F_MSE:  0.033 
  45 E: -5.265 -> -5.265  F_MSE:  0.035 
  46 E: -5.265 -> -5.265  F_MSE:  0.033 
  47 E: -5.265 -> -5.265  F_MSE:  0.035 
  48 E: -5.265 -> -5.265  F_MSE:  0.034 
  49 E: -5.265 -> -5.265  F_MSE:  0.033 
  50 E: -5.265 -> -5.265  F_MSE:  0.034 
  51 E: -5.265 -> -5.265  F_MSE:  0.034 
  52 E: -5.265 -> -5.265  F_MSE:  0.033 
  53 E: -5.265 -> -5.265  F_MSE:  0.035 
  54 E: -5.265 -> -5.265  F_MSE:  0.034 
  55 E: -5.265 -> -5.265  F_MSE:  0.033 
  56 E: -5.265 -> -5.265  F_MSE:  0.033 
  57 E: -5.265 -> -5.265  F_MSE:  0.035 
  58 E: -5.265 -> -5.265  F_MSE:  0.033 
  59 E: -5.265 -> -5.265  F_MSE:  0.035 
  60 E: -5.265 -> -5.265  F_MSE:  0.034 
  61 E: -5.265 -> -5.265  F_MSE:  0.033 
  62 E: -5.265 -> -5.265  F_MSE:  0.034 
  63 E: -5.265 -> -5.265  F_MSE:  0.034 
  64 E: -5.265 -> -5.265  F_MSE:  0.033 
  65 E: -5.265 -> -5.265  F_MSE:  0.035 
  66 E: -5.265 -> -5.265  F_MSE:  0.034 
  67 E: -5.265 -> -5.265  F_MSE:  0.033 
  68 E: -5.265 -> -5.265  F_MSE:  0.033 
  69 E: -5.265 -> -5.265  F_MSE:  0.035 
  70 E: -5.265 -> -5.265  F_MSE:  0.033 
  71 E: -5.265 -> -5.265  F_MSE:  0.035 
  72 E: -5.265 -> -5.265  F_MSE:  0.034 
  73 E: -5.265 -> -5.265  F_MSE:  0.033 
  74 E: -5.265 -> -5.265  F_MSE:  0.034 
  75 E: -5.265 -> -5.265  F_MSE:  0.034 
  76 E: -5.265 -> -5.265  F_MSE:  0.033 
  77 E: -5.265 -> -5.265  F_MSE:  0.034 
  78 E: -5.265 -> -5.265  F_MSE:  0.034 
  79 E: -5.265 -> -5.265  F_MSE:  0.033 
  80 E: -5.265 -> -5.265  F_MSE:  0.033 
  81 E: -5.265 -> -5.265  F_MSE:  0.034 
  82 E: -5.265 -> -5.265  F_MSE:  0.033 
  83 E: -5.265 -> -5.265  F_MSE:  0.035 
  84 E: -5.265 -> -5.265  F_MSE:  0.034 
  85 E: -5.265 -> -5.265  F_MSE:  0.033 
  86 E: -5.265 -> -5.265  F_MSE:  0.033 
  87 E: -5.265 -> -5.265  F_MSE:  0.035 
  88 E: -5.265 -> -5.265  F_MSE:  0.033 
  89 E: -5.265 -> -5.265  F_MSE:  0.034 
  90 E: -5.265 -> -5.265  F_MSE:  0.034 
  91 E: -5.265 -> -5.265  F_MSE:  0.033 
  92 E: -5.265 -> -5.265  F_MSE:  0.033 
  93 E: -5.265 -> -5.265  F_MSE:  0.034 
  94 E: -5.265 -> -5.265  F_MSE:  0.033 
  95 E: -5.265 -> -5.265  F_MSE:  0.034 
  96 E: -5.265 -> -5.265  F_MSE:  0.034 
  97 E: -5.265 -> -5.265  F_MSE:  0.033 
  98 E: -5.265 -> -5.265  F_MSE:  0.034 
  99 E: -5.265 -> -5.265  F_MSE:  0.034 
 100 E: -5.265 -> -5.265  F_MSE:  0.034 
Test Energy [ 100]: R2 0.5746 MAE  0.000 RMSE  0.000
Test Forces [57600]: R2 0.9087 MAE  0.020 RMSE  0.034
5326.568 seconds elapsed
save the figure to  E.png
save the figure to  F.png

The results are not bad. Just too slow Need to fix the #10 before getting back to this issue.

qzhu2017 commented 4 years ago

10/30/2020 The CUPY version (35s/structure) is faster than 24 CPU (53s/structure)

qzhu@cms CSP_BO (master) $ python example_validate.py models/PtHO.json database/PtHO.db
------Gaussian Process Regression------
Kernel: 0.925**2 *Dot(length=4.256) 1 energy (0.002) 104 forces (0.024)

load the GP model from  models/PtHO.json
gpu
Train Energy [   1]: R2 0.9975 MAE  0.000 RMSE  0.000
Train Forces [ 312]: R2 0.9645 MAE  0.013 RMSE  0.018
False
   1 E: -5.265 -> -5.265  F_MSE:  0.033 
   2 E: -5.265 -> -5.265  F_MSE:  0.033 
   3 E: -5.265 -> -5.265  F_MSE:  0.035 
   4 E: -5.265 -> -5.265  F_MSE:  0.034 
   5 E: -5.265 -> -5.265  F_MSE:  0.034 
Test Energy [   5]: R2 0.9825 MAE  0.000 RMSE  0.000
Test Forces [2880]: R2 0.9087 MAE  0.020 RMSE  0.034
176.038 seconds elapsed
save the figure to  E.png
save the figure to  F.png

qzhu2017 commented 4 years ago

11/08/2020

qzhu@cms CSP_BO (master) $ python example_validate.py models/PtHO.json database/PtHO.db 
------Gaussian Process Regression------
Kernel: 0.925**2 *Dot(length=4.256) 1 energy (0.002) 104 forces (0.024)

load the GP model from  models/PtHO.json
gpu
Train Energy [   1]: R2 0.9975 MAE  0.000 RMSE  0.000
Train Forces [ 312]: R2 0.9645 MAE  0.013 RMSE  0.018
False
   1 E: -5.265 -> -5.265  F_MSE:  0.033 
   2 E: -5.265 -> -5.265  F_MSE:  0.033 
   3 E: -5.265 -> -5.265  F_MSE:  0.035 
   4 E: -5.265 -> -5.265  F_MSE:  0.034 
   5 E: -5.265 -> -5.265  F_MSE:  0.034 
Test Energy [   5]: R2 0.9825 MAE  0.000 RMSE  0.000
Test Forces [2880]: R2 0.9087 MAE  0.020 RMSE  0.034
66.691 seconds elapsed
save the figure to  E.png
save the figure to  F.png

qzhu2017 commented 4 years ago

@yanxon Can you update the code and run the following command

$python example_sampling.py database/PtHO.db > log-PtHO &

This is a code to construct the GPR force model for the PtHO data. It will probably take a couple of hours. We will have a discussion on the results tomorrow.

yanxon commented 4 years ago

@qzhu2017

I am running this now. I will update the results after it's done.

qzhu2017 commented 4 years ago

At some point, it will complain that cuda is running out of memory;

  File "example_sampling.py", line 55, in <module>
    model.fit()
  File "/scratch/qzhu/github/CSP_BO/cspbo/gaussianprocess_ef.py", line 85, in fit
    params, loss = self.optimize(obj_func, hyper_params, hyper_bounds)
  File "/scratch/qzhu/github/CSP_BO/cspbo/gaussianprocess_ef.py", line 448, in optimize
    jac=True, options={'maxiter': 10, 'ftol': 1e-3})
  File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_minimize.py", line 618, in minimize
    callback=callback, **options)
  File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 308, in _minimize_lbfgsb
    finite_diff_rel_step=finite_diff_rel_step)
  File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 262, in _prepare_scalar_function
    finite_diff_rel_step, bounds, epsilon=epsilon)
  File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 76, in __init__
    self._update_fun()
  File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 166, in _update_fun
    self._update_fun_impl()
  File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 73, in update_fun
    self.f = fun_wrapped(self.x)
  File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 70, in fun_wrapped
    return fun(x, *args)
  File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 74, in __call__
    self._compute_if_needed(x, *args)
  File "/scratch/qzhu/anaconda3/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 68, in _compute_if_needed
    fg = self.fun(x, *args)
  File "/scratch/qzhu/github/CSP_BO/cspbo/gaussianprocess_ef.py", line 68, in obj_func
    params, eval_gradient=True, clone_kernel=False)
  File "/scratch/qzhu/github/CSP_BO/cspbo/gaussianprocess_ef.py", line 406, in log_marginal_likelihood
    K, K_gradient = kernel.k_total_with_grad(self.train_x)
  File "/scratch/qzhu/github/CSP_BO/cspbo/Dot_mb.py", line 123, in k_total_with_grad
    C_ff, C_ff_s, C_ff_l = self.kff_many(data1[key1], data2[key2], True, True)
  File "/scratch/qzhu/github/CSP_BO/cspbo/Dot_mb.py", line 250, in kff_many
    C[i] = K_ff(x1, x_all, dx1dr, dxdr_all, sigma2, sigma02, zeta, grad, mask, device=self.device)
  File "/scratch/qzhu/github/CSP_BO/cspbo/Dot_mb.py", line 446, in K_ff
    tmp = (dx1dr[:,None,:,None,:] * d2D_dx1dx2[:,:,:,:,None]).sum(axis=(2)) #ijlm
  File "cupy/core/core.pyx", line 940, in cupy.core.core.ndarray.__mul__
  File "cupy/core/_kernel.pyx", line 836, in cupy.core._kernel.ufunc.__call__
  File "cupy/core/_kernel.pyx", line 340, in cupy.core._kernel._get_out_args
  File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__
  File "cupy/cuda/memory.pyx", line 518, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1085, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1106, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 934, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 949, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 697, in cupy.cuda.memory._try_malloc
cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 1775955456 bytes (total 9663106048 bytes)

A possible fix is to split the training data to a few parts

yanxon commented 4 years ago

@qzhu2017 The results just came in. For me, it seems like the calculation stopped at step 1154 with: Kernel: 50.000**2 *Dot(length=4.118) 4 energy (0.005) 154 forces (0.050)

I believe the computation exits because of cuda running out of memory as well.

yanxon commented 4 years ago

Hi @qzhu2017

Let's say if I want to continue PtH2O calculation, do I just use this command?

python3 example_sampling.py models/test.json database/PtHO.db > log

qzhu2017 commented 4 years ago

@yanxon You can also modify the script to make sure that you don't start with structure 0.

yanxon commented 4 years ago

@yanxon You can also modify the script to make sure that you don't start with structure 0.

I see. This is just modifying the range of for loop.

qzhu2017 / CSP_BO

To try Pt-H2O data #17

10/30/2020 The CUPY version (35s/structure) is faster than 24 CPU (53s/structure)

11/08/2020