Open GiacomoDG96 opened 8 months ago
It seems that CuPy
didn't find cuBLAS
. Can you make sure CUDA Toolkit is installed in your system? If installed, you can check out if cupy.dot
works properly.
CUDA Toolkit is installed. When I run nvcc --version I obtain:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Fri_Nov__3_17:16:49_PDT_2023 Cuda compilation tools, release 12.3, V12.3.103 Build cuda_12.3.r12.3/compiler.33492891_0
I have also tried cupy.dot with a toy example and it works.
@GiacomoDG96 OK, great. Possibly, GPU doesn't have enough space for cublas handle. Can you try to limit CuPy memory pool? https://docs.cupy.dev/en/stable/user_guide/memory.html#limiting-gpu-memory-usage
Hi, I am trying to replicate the example https://github.com/pyscf/gpu4pyscf/blob/master/examples/00-h2o.py using a benzene molecule instead of water and I am obtaining the same error as replicating the https://github.com/pyscf/gpu4pyscf/blob/master/examples/07-transition_state.py example with the molecule define in that file.
The error that I obtain is: ######################################################################################### Traceback (most recent call last): File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/df/df_jk.py", line 63, in init_workflow rks.initialize_grids(mf, mf.mol, dm0) File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/dft/rks.py", line 83, in initialize_grids ks.grids = prune_small_rhogrids(ks, ks.mol, dm, ks.grids) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/dft/rks.py", line 39, in prune_small_rhogrids rho = ks._numint.get_rho(mol, dm, grids, ks.max_memory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/dft/numint.py", line 721, in get_rho rho[p0:p1] = eval_rho2(mol, ao_mask, mo_coeff_mask, mo_occ, None, 'LDA', with_lapl) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/dft/numint.py", line 200, in eval_rho2 c0 = _dot_ao_dm(mol, ao, cpos, non0tab, shls_slice, ao_loc) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/dft/numint.py", line 1476, in _dot_ao_dm return cupy.dot(dm.T, ao) ^^^^^^^^^^^^^^^^^^ File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/cupy/linalg/_product.py", line 63, in dot return a.dot(b, out) ^^^^^^^^^^^^^ File "cupy/_core/core.pyx", line 1757, in cupy._core.core._ndarray_base.dot File "cupy/_core/_routines_linalg.pyx", line 536, in cupy._core._routines_linalg.dot File "cupy/_core/_routines_linalg.pyx", line 626, in cupy._core._routines_linalg.tensordot_core File "cupy/_core/_routines_linalg.pyx", line 763, in cupy._core._routines_linalg.tensordot_core_v11 File "cupy_backends/cuda/libs/cublas.pyx", line 1426, in cupy_backends.cuda.libs.cublas.gemmEx File "cupy_backends/cuda/libs/cublas.pyx", line 1454, in cupy_backends.cuda.libs.cublas.gemmEx File "cupy_backends/cuda/libs/cublas.pyx", line 438, in cupy_backends.cuda.libs.cublas.check_status cupy_backends.cuda.libs.cublas.CUBLASError: CUBLAS_STATUS_NOT_INITIALIZED
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/pyscf/lib/misc.py", line 1104, in exit handler.result() File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/concurrent/futures/_base.py", line 456, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/df/df_jk.py", line 43, in build_df mf.with_df.build() File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/df/df.py", line 90, in build self._cderi = cholesky_eri_gpu(intopt, mol, auxmol, self.cd_low, omega=omega) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/df/df.py", line 265, in cholesky_eri_gpu cderi_block = solve_triangular(cd_low, ints_slices, lower=True, overwrite_b=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/cupyx/scipy/linalg/_solve_triangular.py", line 97, in solve_triangular trsm( File "cupy_backends/cuda/libs/cublas.pyx", line 1109, in cupy_backends.cuda.libs.cublas.dtrsm File "cupy_backends/cuda/libs/cublas.pyx", line 1119, in cupy_backends.cuda.libs.cublas.dtrsm File "cupy_backends/cuda/libs/cublas.pyx", line 438, in cupy_backends.cuda.libs.cublas.check_status cupy_backends.cuda.libs.cublas.CUBLASError: CUBLAS_STATUS_EXECUTION_FAILED
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/soralakers96/CODE/gpu4pyscf/gpu4pyscf/examples/07-transition_state.py", line 68, in
mf_GPU.kernel()
File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/scf/hf.py", line 588, in scf
_kernel(mf, mf.conv_tol, mf.conv_tol_grad,
File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/scf/hf.py", line 404, in _kernel
mf.init_workflow(dm0=dm)
File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/gpu4pyscf/df/df_jk.py", line 56, in init_workflow
with lib.call_in_background(build_df) as build:
File "/home/soralakers96/anaconda3/envs/trail_actmol/lib/python3.12/site-packages/pyscf/lib/misc.py", line 1106, in exit
raise ThreadRuntimeError('Error on thread %s:\n%s' % (self, e))
pyscf.lib.misc.ThreadRuntimeError: Error on thread <pyscf.lib.misc.call_in_background object at 0x7f5772b63dd0>:
CUBLAS_STATUS_EXECUTION_FAILED
########################################################################################
I am using NVIDIA L40 with the pre-compiled version pip3 install gpu4pyscf-cuda12x.