pyscf / gpu4pyscf

A plugin to use Nvidia GPU in PySCF package
GNU General Public License v3.0
106 stars 18 forks source link

CUBLAS_STATUS_NOT_INITIALIZED #164

Open gpwood opened 1 month ago

gpwood commented 1 month ago

Hello, I just installed this package on an A10G with CUDA 12:

    [gwood@gaia-single-gpu-dy-g5-4xlarge-1 ~]$ nvidia-smi
    Tue Jun  4 12:15:18 2024       
    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
    |-----------------------------------------+----------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
    |                                         |                      |               MIG M. |
    |=========================================+======================+======================|
    |   0  NVIDIA A10G                    On  | 00000000:00:1E.0 Off |                    0 |
    |  0%   24C    P8              22W / 300W |      4MiB / 23028MiB |      0%      Default |
    |                                         |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+

    +---------------------------------------------------------------------------------------+
    | Processes:                                                                            |
    |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
    |        ID   ID                                                             Usage      |
    |=======================================================================================|
    |  No running processes found                                                           |
    +---------------------------------------------------------------------------------------+

when I run a simple example:

import pyscf
from pyscf.dft import rks

atom ='''
O       0.0000000000    -0.0000000000     0.1174000000
H      -0.7570000000    -0.0000000000    -0.4696000000
H       0.7570000000     0.0000000000    -0.4696000000
'''

mol = pyscf.M(atom=atom, basis='def2-tzvpp')
mf = rks.RKS(mol, xc='LDA').density_fit().to_gpu()  # move PySCF object to GPU4PySCF object
e_dft = mf.kernel()  # compute total energy

I get the following error:

         ~~~~~~^~~~~~~~~~~~~~~~~~
  File "cupy/_core/core.pyx", line 1289, in cupy._core.core._ndarray_base.__matmul__
  File "cupy/_core/_routines_linalg.pyx", line 846, in cupy._core._routines_linalg.matmul
  File "cupy/_core/_routines_linalg.pyx", line 536, in cupy._core._routines_linalg.dot
  File "cupy/_core/_routines_linalg.pyx", line 626, in cupy._core._routines_linalg.tensordot_core
  File "cupy/_core/_routines_linalg.pyx", line 763, in cupy._core._routines_linalg.tensordot_core_v11
  File "cupy_backends/cuda/libs/cublas.pyx", line 1426, in cupy_backends.cuda.libs.cublas.gemmEx
  File "cupy_backends/cuda/libs/cublas.pyx", line 1454, in cupy_backends.cuda.libs.cublas.gemmEx
  File "cupy_backends/cuda/libs/cublas.pyx", line 438, in cupy_backends.cuda.libs.cublas.check_status
cupy_backends.cuda.libs.cublas.CUBLASError: CUBLAS_STATUS_NOT_INITIALIZED

I'm running Python 3.11.9, any ideas?

wxj6000 commented 1 month ago

It could be the incompatibility issue among cuda drvier, cuda toolkit and cupy. Can you also post the output of nvcc --version?

gpwood commented 1 month ago

This is the output:

(/exs/shared/collaboration/teams/qmteam/shared/gwood/pyqc/.venv) [gwood@gaia-single-gpu-dy-g5-4xlarge-1 pyqc]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

I'm loading it through spack

 spack load --best-arch cuda
wxj6000 commented 1 month ago

This is the output:

(/exs/shared/collaboration/teams/qmteam/shared/gwood/pyqc/.venv) [gwood@gaia-single-gpu-dy-g5-4xlarge-1 pyqc]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

I'm loading it through spack

 spack load --best-arch cuda

It seems that you are using cuda toolkit v11. You will need to install gpu4pyscf-cuda11x. Thank you for your feedback. We should clarify more in the installation instruction for the CUDA version.

gpwood commented 1 month ago

ok thank you. Does the installation of gpu4pyscf include cupy? I've followed the install instructions as written using cuda11 versions as recommended but now get this error:

  File "/exs/shared/collaboration/teams/qmteam/shared/gwood/pyqc/.venv/lib/python3.11/site-packages/gpu4pyscf/lib/diis.py", line 25, in <module>
    import cupy
ModuleNotFoundError: No module named 'cupy'

I've just tried to install this with pip3 but it fails:

        File "/tmp/pip-install-ct08rl2u/cupy_03a99d85276d427f869a5ec942870dbd/install/cupy_builder/_compiler.py", line 148, in _nvcc_gencode_options
          assert False
                 ^^^^^
      AssertionError
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for cupy
  Running setup.py clean for cupy
Failed to build cupy
ERROR: Could not build wheels for cupy, which is required to install pyproject.toml-based projects
wxj6000 commented 1 month ago

@gpwood GPU4PySCF does include cupy as a dependency. Since you have installed gpu4pyscf-cuda12x before, pip probably did not install cupy for you again. You will need to uninstall gpu4pyscf-cuda12x and cupy-cuda12x completely via

pip3 uninstall gpu4pyscf-cuda12x
pip3 uninstall cupy-cuda12x

Then pip3 install gpu4pyscf-cuda11x.

And if you want to install cupy individually, you will also need to install it via pip3 install cupy-cuda11x. pip3 install cupy will build cupy from it source code. It will generally fail.