pyscf / gpu4pyscf

A plugin to use Nvidia GPU in PySCF package
GNU General Public License v3.0
106 stars 18 forks source link

Memory Overflow During Geometric Optimization of Large Molecule #149

Closed ORCAaAaA-ui closed 2 months ago

ORCAaAaA-ui commented 2 months ago

Hello,

I am encountering a memory overflow issue while performing geometric optimization on the Tamoxifen molecule with the gpu4pyscf in SMD solvation model. The GPU memory usage momentarily reached its maximum capacity, causing the calculation to terminate. The calculation proceeds without errors for smaller molecules, but fails for Tamoxifen with the following error message:


Traceback (most recent call last):
  File "02-opt.py", line 48, in <module>
    mol_eq = optimize(mf_GPU, maxsteps=1, callback=callback)
  File "/home/jeon/.local/lib/python3.8/site-packages/pyscf/geomopt/geometric_solver.py", line 189, in optimize
    return kernel(method, assert_convergence=assert_convergence, include_ghost=include_ghost,
  File "/home/jeon/.local/lib/python3.8/site-packages/pyscf/geomopt/geometric_solver.py", line 160, in kernel
    geometric.optimize.run_optimizer(customengine=engine, input=tmpf,
  File "/home/jeon/.local/lib/python3.8/site-packages/geometric/optimize.py", line 939, in run_optimizer
    progress = Optimize(coords, M, IC, engine, dirname, params)
  File "/home/jeon/.local/lib/python3.8/site-packages/geometric/optimize.py", line 778, in Optimize
    return optimizer.optimizeGeometry()
  File "/home/jeon/.local/lib/python3.8/site-packages/geometric/optimize.py", line 691, in optimizeGeometry
    self.calcEnergyForce()
  File "/home/jeon/.local/lib/python3.8/site-packages/geometric/optimize.py", line 282, in calcEnergyForce
    spcalc = self.engine.calc(self.X, self.dirname, read_data=(self.Iteration==0))
  File "/home/jeon/.local/lib/python3.8/site-packages/geometric/engine.py", line 253, in calc
    result = self.calc_new(coords, dirname)
  File "/home/jeon/.local/lib/python3.8/site-packages/pyscf/geomopt/geometric_solver.py", line 88, in calc_new
    energy, gradients = g_scanner(mol)
  File "/home/jeon/.local/lib/python3.8/site-packages/gpu4pyscf/grad/rhf.py", line 634, in __call__
    de = self.kernel(**kwargs)
  File "/home/jeon/.local/lib/python3.8/site-packages/gpu4pyscf/solvent/grad/smd.py", line 262, in kernel
    self.de_solvent+= pcm_grad.grad_solver(self.base.with_solvent, dm)
  File "/home/jeon/.local/lib/python3.8/site-packages/gpu4pyscf/solvent/grad/pcm.py", line 236, in grad_solver
    dD, dS, dSii = get_dD_dS(pcmobj.surface, dF, with_D=with_D, with_S=True)
  File "/home/jeon/.local/lib/python3.8/site-packages/gpu4pyscf/solvent/grad/pcm.py", line 138, in get_dD_dS
    dD = dD_dri * drij + dS_dr * (-nj/rij + 3.0*nj_rij/rij**2 * drij)
  File "cupy/_core/core.pyx", line 1305, in cupy._core.core._ndarray_base.__truediv__
  File "cupy/_core/_kernel.pyx", line 1347, in cupy._core._kernel.ufunc.__call__
  File "cupy/_core/_kernel.pyx", line 645, in cupy._core._kernel._get_out_args_from_optionals
  File "cupy/_core/core.pyx", line 2779, in cupy._core.core._ndarray_init
  File "cupy/_core/core.pyx", line 237, in cupy._core.core._ndarray_base._init_fast
  File "cupy/cuda/memory.pyx", line 740, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1426, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1447, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1118, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1139, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 1384, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
  File "cupy/cuda/memory.pyx", line 1387, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 1,056,876,032 bytes (allocated so far: 17,953,574,400 bytes, limit set to: 22,854,323,404 bytes).

Has anyone experienced similar issues, or does anyone have suggestions on how to manage memory usage more effectively in such calculations? Any advice or adjustments to settings that might help bypass this memory constraint would be greatly appreciated.

Thank you!

wxj6000 commented 2 months ago

Thank you for raising this issue. We did observe the similar issue. Unfortunately, we don't have the direct control the GPU memory in this case. In the solvent models, we use a lot of grids to make sure the accuracy. You can reduce the lebedev_order in https://github.com/pyscf/gpu4pyscf/blob/master/examples/16-smd_solvent.py#L33 base on https://github.com/pyscf/pyscf/blob/master/pyscf/dft/LebedevGrid.py#L4999 mf.with_solvent.lebedev_order = 17 should be enough in most cases. This will reduce the memory consumption a lot.

You can also try to call mempool.free_all_blocks() to release GPU memory before the expensive calculations.

In the future, this part of python code will be moved into cuda kernel for memory efficiency.

wxj6000 commented 2 months ago

The memory usage has been improved in https://github.com/pyscf/gpu4pyscf/pull/150 It would be very helpful if you could test it on your side. You will need to compile the master branch for testing.