Unable to compile in Docker image

Hi,

I cannot get gpu4pyscf to compile inside the environment generated through 'dockerfiles/compile/Dockerfile'

Environment:

uname -a:
Linux 6.5.0-25-generic #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Feb 20 16:09:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
nvidia-smi:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off | 00000000:01:00.0  On |                  N/A |
| 30%   60C    P2             175W / 275W |   5021MiB / 11264MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Issues:

unclear instructions on getting started

Based on assumptions / trail and error I followed this scenario:

build docker
login to docker env on a terminal
put gpu4pyscf inside '/usr/local/lib/python3.9/dist-packages/pyscf/'
CD into gpu4pyscf
run 'sh build.sh'

=> Are these steps correct? Or am I missing something already?

I suggest to add some instructions to get started with compiling to the Readme file

compile failed due to missing Fortran compiler - no fortran compiler found in docker env

Actions taken: apt-get update && apt-get install gfortran

=> is this the correct Fortran compiler?

I Suggest to add proper Fortran compiler to docker 'compile' environment or add instructions for selecting/installing the correct compiler

compile failed due to undefined references:

Errors below, I currently haven't been able to find out what I am missing.

Scanning dependencies of target multicharge-exe
[ 65%] Building Fortran object _deps/multicharge-build/app/CMakeFiles/multicharge-exe.dir/main.f90.o
[ 67%] Linking Fortran executable multicharge
/usr/bin/ld: ../libmulticharge.a(blas.F90.o): in function `__multicharge_blas_MOD_mchrg_dgemm':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/blas.F90:520: undefined reference to `dgemm_'
/usr/bin/ld: ../libmulticharge.a(blas.F90.o): in function `__multicharge_blas_MOD_mchrg_sgemm':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/blas.F90:475: undefined reference to `sgemm_'
/usr/bin/ld: ../libmulticharge.a(blas.F90.o): in function `__multicharge_blas_MOD_mchrg_dsymv':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/blas.F90:430: undefined reference to `dsymv_'
/usr/bin/ld: ../libmulticharge.a(blas.F90.o): in function `__multicharge_blas_MOD_mchrg_ssymv':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/blas.F90:397: undefined reference to `ssymv_'
/usr/bin/ld: ../libmulticharge.a(blas.F90.o): in function `__multicharge_blas_MOD_mchrg_dgemv':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/blas.F90:364: undefined reference to `dgemv_'
/usr/bin/ld: ../libmulticharge.a(blas.F90.o): in function `__multicharge_blas_MOD_mchrg_sgemv':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/blas.F90:330: undefined reference to `sgemv_'
/usr/bin/ld: ../libmulticharge.a(lapack.F90.o): in function `__multicharge_lapack_MOD_mchrg_dsytri':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:353: undefined reference to `dsytri_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:353: undefined reference to `dsytri_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:353: undefined reference to `dsytri_'
/usr/bin/ld: ../libmulticharge.a(lapack.F90.o): in function `__multicharge_lapack_MOD_mchrg_ssytri':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:322: undefined reference to `ssytri_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:322: undefined reference to `ssytri_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:322: undefined reference to `ssytri_'
/usr/bin/ld: ../libmulticharge.a(lapack.F90.o): in function `__multicharge_lapack_MOD_mchrg_dsytrs':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:247: undefined reference to `dsytrs_'
/usr/bin/ld: ../libmulticharge.a(lapack.F90.o): in function `__multicharge_lapack_MOD_mchrg_ssytrs':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:221: undefined reference to `ssytrs_'
/usr/bin/ld: ../libmulticharge.a(lapack.F90.o): in function `__multicharge_lapack_MOD_mchrg_dsytrf':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:183: undefined reference to `dsytrf_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:190: undefined reference to `dsytrf_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:183: undefined reference to `dsytrf_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:190: undefined reference to `dsytrf_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:183: undefined reference to `dsytrf_'
/usr/bin/ld: ../libmulticharge.a(lapack.F90.o):/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:183: more undefined references to `dsytrf_' follow
/usr/bin/ld: ../libmulticharge.a(lapack.F90.o): in function `__multicharge_lapack_MOD_mchrg_ssytrf':
/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:144: undefined reference to `ssytrf_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:151: undefined reference to `ssytrf_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:144: undefined reference to `ssytrf_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:151: undefined reference to `ssytrf_'
/usr/bin/ld: /usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:144: undefined reference to `ssytrf_'
/usr/bin/ld: ../libmulticharge.a(lapack.F90.o):/usr/local/lib/python3.9/dist-packages/pyscf/gpu4pyscf/build/temp.linux-x86_64/gpu4pyscf/deps/src/dftd4_static-build/_deps/multicharge-src/src/multicharge/lapack.F90:144: more undefined references to `ssytrf_' follow
collect2: error: ld returned 1 exit status
gmake[5]: *** [_deps/multicharge-build/app/CMakeFiles/multicharge-exe.dir/build.make:105: _deps/multicharge-build/app/multicharge] Error 1
gmake[4]: *** [CMakeFiles/Makefile2:489: _deps/multicharge-build/app/CMakeFiles/multicharge-exe.dir/all] Error 2
gmake[3]: *** [Makefile:160: all] Error 2
gmake[2]: *** [CMakeFiles/dftd4_static.dir/build.make:130: deps/src/dftd4_static-stamp/dftd4_static-build] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:211: CMakeFiles/dftd4_static.dir/all] Error 2
gmake: *** [Makefile:103: all] Error 2

@Svennemans Thanks for the feedback. The instruction and dockerfile were not updated after the compilation changes. They are updated now in the master branch.

BTW, if you find adding any additional information in the instruction is useful for others, please feel free to create a PR.

https://github.com/pyscf/gpu4pyscf/pull/127

@wxj6000 Thanks, the good news is that with the new dockerfile and compile steps, it does indeed compile.

The new instructions do not create the necessary wheels to "pip install" in another location, though. Which was the case I think with the build.sh/setup.py. How can this be achieved?

I tried to test inside the docker When I try to do the following inside the docker:

import pyscf
from gpu4pyscf.dst import rks

I get the following error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/pyscf/gpu4pyscf/gpu4pyscf/__init__.py", line 1, in <module>
    from . import lib, grad, hessian, solvent, scf, dft
  File "/usr/local/lib/python3.10/dist-packages/pyscf/gpu4pyscf/gpu4pyscf/lib/__init__.py", line 19, in <module>
    from gpu4pyscf.lib import cupy_helper
  File "/usr/local/lib/python3.10/dist-packages/pyscf/gpu4pyscf/gpu4pyscf/lib/cupy_helper.py", line 28, in <module>
    from gpu4pyscf.lib.cusolver import eigh, cholesky  #NOQA
  File "/usr/local/lib/python3.10/dist-packages/pyscf/gpu4pyscf/gpu4pyscf/lib/cusolver.py", line 25, in <module>
    _handle = device.get_cusolver_handle()
  File "cupy/cuda/device.pyx", line 65, in cupy.cuda.device.get_cusolver_handle
  File "cupy/cuda/device.pyx", line 66, in cupy.cuda.device.get_cusolver_handle
  File "cupy/cuda/device.pyx", line 44, in cupy.cuda.device._get_device
  File "cupy_backends/cuda/api/runtime.pyx", line 202, in cupy_backends.cuda.api.runtime.getDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version

But anyway I do not want to run gpu4pyscf inside the compile docker, so I need to get a working install outside docker.

When I try to copy the gpu4pyscf outside the docker, in a conda env with pyscf and scipy/numpy/h5py/... preinstalled and try to run it there I get the following error:

>>> import pyscf
>>> from gpu4pyscf.dst import rks
/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/cupy/_environment.py:369: UserWarning: CuPy failed to preload library (/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/cutensor/lib/libcutensor.so.2): OSError (libcublasLt.so.11: cannot open shared object file: No such file or directory)
  warnings.warn(msg)
/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/lib/cutensor.py:138: UserWarning: using cupy as the tensor contraction engine.
  warnings.warn(f'using {contract_engine} as the tensor contraction engine.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/__init__.py", line 1, in <module>
    from . import lib, grad, hessian, solvent, scf, dft
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/lib/__init__.py", line 19, in <module>
    from gpu4pyscf.lib import cupy_helper
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/lib/cupy_helper.py", line 28, in <module>
    from gpu4pyscf.lib.cusolver import eigh, cholesky  #NOQA
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/lib/cusolver.py", line 21, in <module>
    from cupy_backends.cuda.libs import cusolver
ImportError: libcusolver.so.11: cannot open shared object file: No such file or directory

@Svennemans For the first issue, you would need nvidia-docker run ... instead of docker.

For the second issue, the compiled libraries in docker generally does not work in a different environment. If you would like to run it outside the docker environment, you can build gpu4pyscf in your current environment. You will need to collect the build tools in gpu4pyscf/dockerfiles/compile/Dockerfile and run the scripts in the instruction. You won't need to build the wheel and install. The following script in the instruction will let python find gpu4pyscf CURRENT_PATH=pwd export PYTHONPATH="${PYTHONPATH}:${CURRENT_PATH}"

There is no a simple instruction to build a GPU package for different environments. The dockerfile was created for a "standard" environment. Be prepared for some issues.

@wxj6000 I'm curious then, how did you compile the binary packages that one can install without compiling?

@Svennemans The binary packages are built in manylinux environment. You can find the dockerfile and build script here. https://github.com/pyscf/gpu4pyscf/tree/master/dockerfiles/manylinux And you will also need to build cuda libxc. cuda libxc has to be compiled into a separated package, because we have to control the total package size. https://github.com/pyscf/gpu4pyscf/tree/master/builder

Incase it is helpful I made a singularity image to build a version of this plugin for older computer capability and cuda11. You can build the binary wheel with the singularity shell and then pip install it wherever you want: https://github.com/sef43/gpu4pyscf/tree/master?tab=readme-ov-file#updated-for-compute-capability-60-and-build-loally-with-singularity-image

Thanks for the suggestions @wxj6000 and @sef43 , I'll try those for sure. Meanwhile, I've installed cuda toolkit and now the import statements are working fine for my compiled code. However I hit the next snag. I was continuing the "readme" example in an interactive Python shell:

import pyscf
from gpu4pyscf.dft import rks

atom ='''
O       0.0000000000    -0.0000000000     0.1174000000
H      -0.7570000000    -0.0000000000    -0.4696000000
H       0.7570000000     0.0000000000    -0.4696000000
'''

mol = pyscf.M(atom=atom, basis='def2-tzvpp')
mf = rks.RKS(mol, xc='LDA').density_fit()

e_dft = mf.kernel()  # compute total energy

the last statement throws the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/scf/hf.py", line 583, in scf
    mf.build(mf.mol)
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/pyscf/scf/hf.py", line 1580, in build
    self.check_sanity()
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/pyscf/scf/hf.py", line 2112, in check_sanity
    return SCF.check_sanity(self)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/pyscf/scf/hf.py", line 1570, in check_sanity
    cond = lib.cond(s1e)
           ^^^^^^^^^^^^^
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/pyscf/lib/numpy_helper.py", line 923, in cond
    return numpy.asarray([numpy.linalg.cond(xi) for xi in x])
                          ^^^^^^^^^^^^^^^^^^^^^
TypeError: no implementation found for 'numpy.linalg.cond' on types that implement __array_function__: [<class 'cupy.ndarray'>]

Initial search seemed to indicate this may be a mismatch between cupy/numpy versions. For reference, this is my list of installed packages:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
bzip2                     1.0.8                h5eee18b_5  
ca-certificates           2024.3.11            h06a4308_0  
cupy-cuda12x              13.0.0                   pypi_0    pypi
cutensor-cu12             2.0.1                    pypi_0    pypi
expat                     2.5.0                h6a678d5_0  
fastrlock                 0.8.2                    pypi_0    pypi
geometric                 1.0.2                    pypi_0    pypi
h5py                      3.10.0                   pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
ncurses                   6.4                  h6a678d5_0  
networkx                  3.2.1                    pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
openssl                   3.0.13               h7f8727e_0  
pip                       23.3.1          py312h06a4308_0  
pyscf                     2.5.0                    pypi_0    pypi
python                    3.12.2               h996f2a0_0  
readline                  8.2                  h5eee18b_0  
scipy                     1.12.0                   pypi_0    pypi
setuptools                68.2.2          py312h06a4308_0  
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0  
tk                        8.6.12               h1ccaba5_0  
tzdata                    2024a                h04d1e81_0  
wheel                     0.41.2          py312h06a4308_0  
xz                        5.4.6                h5eee18b_0  
zlib                      1.2.13               h5eee18b_0

version of cuda toolkit (latest) = 12.4 Do you see any incompatibilities?

I guess you also compiled the latest pyscf. The master branch in github introduced some incompatibility with gpu4pyscf. You can either pip install pyscf==2.5.0 --upgrade --force-reinstall or use any older version before this PR (https://github.com/pyscf/pyscf/pull/2078)

BTW, we just released gpu4pyscf v0.7.5. The new release supports older GPUs such as GTX 1080. You can pip3 install gpu4pyscf-cuda11x or pip3 install gpu4pyscf-cuda12x

OK, pip install pyscf==2.5.0 --upgrade --force-reinstall fixed the TypeError. Thanks. After that I got a CudaError, but I was expecting that as I indeed have a Compute 6.1 card (GTX1080Ti)

Great news that gpu4pyscf v0.7.5 now supports compute 6+! I will update and recompile to see if that then fixes the CudaError.

FYI: my long term goal is to see if it's possible to create a working pyscf/gpu4pyscf combo on Windows, which is why I'm mainly interested in understanding the compilation steps rather than just get a working version on my Linux machine.

So I recompiled the cuda v12 code again with gpu4pyscf v0.7.5.

Running the example once again the previous CudaError was gone, but now I get this - which explicitly mentions an incompatibility of compute architecture below 70 for 'Cuda synchronisation primitives'...

Did you have a chance to check the binary images you created against a card with compute below 7? I'll test them just the same to see if that reproduces below error.

>>> e_dft = mf.kernel()  # compute total energy
---------------------------------------------------
--- JIT compile log for cupy_jitify_exercise ---
---------------------------------------------------
cub/util_cpp_dialect.cuh(143): warning #161-D: unrecognized #pragma
       CUB_COMPILER_DEPRECATION_SOFT(C++14, C++11);
       ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

std/barrier(16): catastrophic error: #error directive: "CUDA synchronization primitives are only supported for sm_70 and up."
  #  error "CUDA synchronization primitives are only supported for sm_70 and up."
     ^

1 catastrophic error detected in the compilation of "cupy_jitify_exercise".
Compilation terminated.

---------------------------------------------------
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/scf/hf.py", line 588, in scf
    _kernel(mf, mf.conv_tol, mf.conv_tol_grad,
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/scf/hf.py", line 404, in _kernel
    mf.init_workflow(dm0=dm)
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/df/df_jk.py", line 63, in init_workflow
    rks.initialize_grids(mf, mf.mol, dm0)
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/dft/rks.py", line 83, in initialize_grids
    ks.grids = prune_small_rho_grids_(ks, ks.mol, dm, ks.grids)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/gpu4pyscf/gpu4pyscf/dft/rks.py", line 52, in prune_small_rho_grids_
    logger.debug(grids, 'Drop grids %d', grids.weights.size - cupy.count_nonzero(idx))
                                                              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/cupy/_sorting/count.py", line 24, in count_nonzero
    return _count_nonzero(a, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "cupy/_core/_reduction.pyx", line 618, in cupy._core._reduction._SimpleReductionKernel.__call__
  File "cupy/_core/_reduction.pyx", line 370, in cupy._core._reduction._AbstractReductionKernel._call
  File "cupy/_core/_cub_reduction.pyx", line 689, in cupy._core._cub_reduction._try_to_call_cub_reduction
  File "cupy/_core/_cub_reduction.pyx", line 526, in cupy._core._cub_reduction._launch_cub
  File "cupy/_core/_cub_reduction.pyx", line 461, in cupy._core._cub_reduction._cub_two_pass_launch
  File "cupy/_util.pyx", line 64, in cupy._util.memoize.decorator.ret
  File "cupy/_core/_cub_reduction.pyx", line 240, in cupy._core._cub_reduction._SimpleCubReductionKernel_get_cached_function
  File "cupy/_core/_cub_reduction.pyx", line 223, in cupy._core._cub_reduction._create_cub_reduction_function
  File "cupy/_core/core.pyx", line 2254, in cupy._core.core.compile_with_cache
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/cupy/cuda/compiler.py", line 484, in _compile_module_with_cache
    return _compile_with_cache_cuda(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/cupy/cuda/compiler.py", line 562, in _compile_with_cache_cuda
    ptx, mapping = compile_using_nvrtc(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/cupy/cuda/compiler.py", line 319, in compile_using_nvrtc
    return _compile(source, options, cu_path,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/cupy/cuda/compiler.py", line 284, in _compile
    options, headers, include_names = _jitify_prep(
                                      ^^^^^^^^^^^^^
  File "/home/svennemans/Development/Python/CondaEnvs/PySCF/lib/python3.12/site-packages/cupy/cuda/compiler.py", line 233, in _jitify_prep
    jitify._init_module()
  File "cupy/cuda/jitify.pyx", line 212, in cupy.cuda.jitify._init_module
  File "cupy/cuda/jitify.pyx", line 233, in cupy.cuda.jitify._init_module
  File "cupy/cuda/jitify.pyx", line 209, in cupy.cuda.jitify._init_cupy_headers
  File "cupy/cuda/jitify.pyx", line 192, in cupy.cuda.jitify._init_cupy_headers_from_scratch
  File "cupy/cuda/jitify.pyx", line 264, in cupy.cuda.jitify.jitify
RuntimeError: Runtime compilation failed

@wxj6000 FYI I get the exact same catastrophic error when running the binary downloads - both v12 and v11.

std/barrier(16): catastrophic error: #error directive: "CUDA synchronization primitives are only supported for sm_70 and up."
  #  error "CUDA synchronization primitives are only supported for sm_70 and up."

@wxj6000 this might give an idea on what is wrong as of cupy v13: https://github.com/cupy/cupy/issues/8184

Success!!

I have downgraded cupy-cuda to v12.3.0 and then it runs fine. Tested with my compiled code of gpu4pyscf (v12), and with both binary downloads (gpu4pyscf-cuda12x / gpu4pyscf-cuda11x)

Suggestion: change the package dependencies for the binary downloads from requiring cupy-cuda v13.0.0 => cupy-cuda v12.3.0 until cupy v13.1.0 is released.

@Svennemans Thank you for bringing up the issue and testing the code. Some modules of the code are using the new APIs introduced in CuPy v13.0. Switching to CuPy v12.3 will break down some functionality. I am removing the version requirement of CuPy. People switch to different versions based on their use case.

The release of CuPy V13.1 is scheduled this month. The issue will be resolved by then.

@wxj6000 when trying to build wheels using docker images, failure on build_dockers.sh:

737.6 ./install_cuda.sh: line 149: prune_118: command not found
------
Dockerfile:31
--------------------
  29 |     ARG BASE_CUDA_VERSION=11.8
  30 |     ADD install_cuda.sh install_cuda.sh
  31 | >>> RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
  32 |     
  33 |     ENV CUDA_HOME="/usr/local/cuda" LD_LIBRARY_PATH="${CUDA_HOME}/lib64::${LD_LIBRARY_PATH}"
--------------------
ERROR: failed to solve: process "/bin/sh -c bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh" did not complete successfully: exit code: 127

@Svennemans just remove prune_118 in install_cuda.sh

The image is available in docker.io (wxj6000/manylinux2014:cuda118)

pyscf / gpu4pyscf

Unable to compile in Docker image #126