Closed Zakk-Yang closed 8 months ago

Zakk-Yang commented 11 months ago


Installed RAPIDS in a WSL2 environment. Error when importing cudf:

CudaSupportError: Error at driver init: 
Call to cuInit results in CUDA_ERROR_NO_DEVICE (100):

Full error code:

CudaAPIError                              Traceback (most recent call last)
File [~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py:258](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/d/learn-rapids/~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py:258), in Driver.ensure_initialized(self)
    257     _logger.info('init')
--> 258     self.cuInit(0)
    259 except CudaAPIError as e:

File [~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py:331](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/d/learn-rapids/~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py:331), in Driver._ctypes_wrap_fn.<locals>.safe_cuda_api_call(*args)
    330 retcode = libfn(*args)
--> 331 self._check_ctypes_error(fname, retcode)

File [~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py:399](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/d/learn-rapids/~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py:399), in Driver._check_ctypes_error(self, fname, retcode)
    398     self._detect_fork()
--> 399 raise CudaAPIError(retcode, msg)

CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE

During handling of the above exception, another exception occurred:

CudaSupportError                          Traceback (most recent call last)
/mnt/d/learn-rapids/Untitled.ipynb Cell 1 line 2
     [22](vscode-notebook-cell://wsl%2Bubuntu/mnt/d/learn-rapids/Untitled.ipynb#W0sdnNjb2RlLXJlbW90ZQ%3D%3D?line=21) from numba import cuda
     [24](vscode-notebook-cell://wsl%2Bubuntu/mnt/d/learn-rapids/Untitled.ipynb#W0sdnNjb2RlLXJlbW90ZQ%3D%3D?line=23) print("Allocating array")
---> [26](vscode-notebook-cell://wsl%2Bubuntu/mnt/d/learn-rapids/Untitled.ipynb#W0sdnNjb2RlLXJlbW90ZQ%3D%3D?line=25) cuda.device_array(1)
     [28](vscode-notebook-cell://wsl%2Bubuntu/mnt/d/learn-rapids/Untitled.ipynb#W0sdnNjb2RlLXJlbW90ZQ%3D%3D?line=27) print("Finished")
--> 262     raise CudaSupportError(f"Error at driver init: {description}")
    263 else:
    264     self.pid = _getpid()

CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

nvidia-smi has the following info in wsl:

(rapids-23.12) zy-wsl@yjl-dl:/mnt/c/Users/zakky$ nvidia-smi
Sat Oct 28 17:36:40 2023       nt/c/Users/zakky$
| NVIDIA-SMI 545.23.05              Driver Version: 545.84       CUDA Version: 12.3     |
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA RTX A6000               On  | 00000000:01:00.0  On |                  Off |
| 30%   48C    P5              40W / 300W |   2003MiB / 49140MiB |     40%      Default |
|                                         |                      |                  N/A |

| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|    0   N/A  N/A        23      G   /Xwayland                                 N/A      |
|    0   N/A  N/A      7498      C   /python3.10                               N/A      |

Installation command used:

conda create --solver=libmamba -n rapids-23.12 -c rapidsai-nightly -c conda-forge -c nvidia  \
    cudf=23.12 cuml=23.12 python=3.10 cuda-version=12.0 \

Command numba -s in WSL environment with the following info:

(rapids-23.12) zy-wsl@yjl-dl:/mnt/c/Users/zakky$ numba -s
System info:
/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/np/ufunc/parallel.py:371: NumbaWarning: The TBB threading layer requires TBB version 2021 update 6 or later i.e., TBB_INTERFACE_VERSION >= 12060. Found TBB_INTERFACE_VERSION = 12050. The TBB threading layer is disabled.
__Time Stamp__
Report started (local time)                   : 2023-10-28 17:21:28.731152
UTC start time                                : 2023-10-28 16:21:28.731155
Running time (s)                              : 1.117497

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : alderlake
CPU Count                                     : 24
Number of accessible CPUs                     : 24
List of accessible CPUs cores                 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2 avxvnni bmi
                                                bmi2 clflushopt clwb cmov crc32
                                                cx16 cx8 f16c fma fsgsbase fxsr
                                                gfni invpcid lzcnt mmx movbe
                                                movdir64b movdiri pclmul popcnt
                                                prfchw rdpid rdrnd rdseed sahf
                                                serialize sha shstk sse sse2 sse3
                                                sse4.1 sse4.2 ssse3 vaes
                                                vpclmulqdq waitpkg xsave xsavec
                                                xsaveopt xsaves

Memory Total (MB)                             : 31943
Memory Available (MB)                         : 29762

__OS Information__
Platform Name                                 : Linux-
Platform Release                              :
OS Name                                       : Linux
OS Version                                    : #1 SMP Fri Jan 27 02:56:13 UTC 2023
OS Specific Version                           : ?
Libc Version                                  : glibc 2.35

__Python Information__
Python Compiler                               : GCC 12.3.0
Python Implementation                         : CPython
Python Version                                : 3.10.13
Python Locale                                 : en_US.UTF-8

__Numba Toolchain Versions__
Numba Version                                 : 0.57.1
llvmlite Version                              : 0.40.1

__LLVM Information__
LLVM Version                                  : 14.0.6

__CUDA Information__
CUDA Device Initialized                       : False
CUDA Driver Version                           : ?
CUDA Runtime Version                          : ?
CUDA NVIDIA Bindings Available                : ?
CUDA NVIDIA Bindings In Use                   : ?
CUDA Minor Version Compatibility Available    : ?
CUDA Minor Version Compatibility Needed       : ?
CUDA Minor Version Compatibility In Use       : ?
CUDA Detect Output:
CUDA Libraries Test Output:

__NumPy Information__
NumPy Version                                 : 1.24.4
NumPy Supported SIMD features                 : ('MMX', 'SSE', 'SSE2', 'SSE3', 'SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2')
NumPy Supported SIMD dispatch                 : ('SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL')
NumPy Supported SIMD baseline                 : ('SSE', 'SSE2', 'SSE3')
NumPy AVX512_SKX support detected             : False

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : False
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available              : True
+-->Vendor: GNU
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda Build                                   : not installed
Conda Env                                     : 23.5.2
Conda Platform                                : linux-64
Conda Python Version                          : 3.11.4.final.0
Conda Root Writable                           : True

__Installed Packages__
No errors reported.

__Warning log__
Warning (cuda): CUDA device initialisation problem. Message:Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_quota_us
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_period_us
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as

IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.

Objective: Seeking help in resolving the CUDA initialization error when trying to import cudf in the WSL2 environment.

bdice commented 11 months ago

Thanks for the detail in your report above. Most of your environment setup seems right. Did you install any CUDA packages into the WSL environment, outside of conda? Your WSL instance should not have a CUDA driver installed, or else it will cause problems. Only the Windows host system should provide the driver. Maybe post the output of apt list?

NTNguyen13 commented 10 months ago

Hi, I have the same problem (trying to use cudf on WSL2), however my numba -s output is little bit different:

System info:
/home/nguyen/anaconda3/envs/rapids-23.10/lib/python3.10/site-packages/numba/np/ufunc/parallel.py:371: NumbaWarning: The TBB threading layer requires TBB version 2021 update 6 or later i.e., TBB_INTERFACE_VERSION >= 12060. Found TBB_INTERFACE_VERSION = 12050. The TBB threading layer is disabled.
__Time Stamp__
Report started (local time)                   : 2023-11-15 00:51:07.117163
UTC start time                                : 2023-11-14 17:51:07.117166
Running time (s)                              : 4.804779

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : skylake
CPU Count                                     : 16
Number of accessible CPUs                     : 16
List of accessible CPUs cores                 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2 bmi bmi2
                                                clflushopt cmov crc32 cx16 cx8
                                                f16c fma fsgsbase fxsr invpcid
                                                lzcnt mmx movbe pclmul popcnt
                                                prfchw rdrnd rdseed sahf sse sse2
                                                sse3 sse4.1 sse4.2 ssse3 xsave
                                                xsavec xsaveopt xsaves

Memory Total (MB)                             : 32066
Memory Available (MB)                         : 26497

__OS Information__
Platform Name                                 : Linux-
Platform Release                              :
OS Name                                       : Linux
OS Version                                    : #1 SMP Fri Apr 2 22:23:49 UTC 2021
OS Specific Version                           : ?
Libc Version                                  : glibc 2.35

__Python Information__
Python Compiler                               : GCC 12.3.0
Python Implementation                         : CPython
Python Version                                : 3.10.13
Python Locale                                 : en_US.UTF-8

__Numba Toolchain Versions__
Numba Version                                 : 0.57.1
llvmlite Version                              : 0.40.1

__LLVM Information__
LLVM Version                                  : 14.0.6

__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : 12.2
CUDA Runtime Version                          : 11.8
CUDA NVIDIA Bindings Available                : True
CUDA NVIDIA Bindings In Use                   : False
CUDA Minor Version Compatibility Available    : True
CUDA Minor Version Compatibility Needed       : False
CUDA Minor Version Compatibility In Use       : False
CUDA Detect Output:
Found 1 CUDA devices
id 0    b'NVIDIA GeForce RTX 2080 Ti'                              [SUPPORTED]
                      Compute Capability: 7.5
                           PCI Device ID: 0
                              PCI Bus ID: 1
                                    UUID: GPU-01bc9f8e-d066-a5b1-bb43-007a711170ac
                                Watchdog: Enabled
             FP32/FP64 Performance Ratio: 32
        1/1 devices are supported

CUDA Libraries Test Output:
Finding driver from candidates: /usr/lib/wsl/lib/libcuda.so.1...
Using loader <class 'ctypes.CDLL'>
        trying to load driver...        ok, loaded from /usr/lib/wsl/lib/libcuda.so.1
Finding nvvm from Conda environment
        named  libnvvm.so.4.0.0
        trying to open library...       ok
Finding cudart from Conda environment
        named  libcudart.so.11.8.89
        trying to open library...       ok
Finding cudadevrt from Conda environment
        named  libcudadevrt.a
Finding libdevice from Conda environment
        trying to open library...       ok

__NumPy Information__
NumPy Version                                 : 1.24.4
NumPy Supported SIMD features                 : ('MMX', 'SSE', 'SSE2', 'SSE3', 'SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2')
NumPy Supported SIMD dispatch                 : ('SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL')
NumPy Supported SIMD baseline                 : ('SSE', 'SSE2', 'SSE3')
NumPy AVX512_SKX support detected             : False

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : False
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available              : True
+-->Vendor: GNU
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
NUMBA_CUDA_DRIVER                             : /usr/lib/wsl/lib/libcuda.so.1

__Conda Information__
Conda Build                                   : 3.27.0
Conda Env                                     : 23.10.0
Conda Platform                                : linux-64
Conda Python Version                          : 3.9.18.final.0
Conda Root Writable                           : True

__Installed Packages__
No errors reported.

__Warning log__
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_quota_us
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_period_us
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as

IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.

I followed this installation guide from NVIDIA: https://docs.nvidia.com/cuda/wsl-user-guide/index.html. Other applications using CUDA such as stable diffusion are working well.

Please comment on how I can provide more information if needed to troubleshoot this.

bdice commented 10 months ago

@NTNguyen13 Can you also post the result of apt list? Also take a look at this issue from Numba: https://github.com/numba/numba/issues/6777

NTNguyen13 commented 10 months ago

my apt list is quite long, could you suggest some grep keyword to filter it? I tried the command NUMBA_CUDA_LOG_LEVEL=DEBUG python -c "import cudf; cudf.Series([1,2,3])" the output is:

== CUDA (ptxcompiler) [653] DEBUG -- CUDA Driver version 12.2
== CUDA (ptxcompiler) [653] DEBUG -- CUDA Runtime version 11.8
== CUDA [803] DEBUG -- call runtime api: cudaRuntimeGetVersion
== CUDA [2490]  INFO -- init
== CUDA [2491] DEBUG -- call driver api: cuInit
== CUDA [2491] DEBUG -- call driver api: cuCtxGetCurrent
== CUDA [2491] DEBUG -- call driver api: cuDeviceGetCount
== CUDA [2492] DEBUG -- call driver api: cuDeviceGet
== CUDA [2492] DEBUG -- call driver api: cuDeviceGetAttribute
== CUDA [2492] DEBUG -- call driver api: cuDeviceGetAttribute
== CUDA [2492] DEBUG -- call driver api: cuDeviceGetName
== CUDA [2492] DEBUG -- call driver api: cuDeviceGetUuid_v2
== CUDA [2492] DEBUG -- call driver api: cuDevicePrimaryCtxRetain
== CUDA [2963] DEBUG -- call driver api: cuCtxPushCurrent_v2
== CUDA [2963] DEBUG -- call driver api: cuMemGetInfo_v2

Edit: After adding the export to bashrc AND restart my VScode runtime, it works fine now

Zakk-Yang commented 10 months ago

Hi, Problem solved by following: under the wsl instance nano ~/.bashrc

export LD_LIBRARY_PATH="/usr/lib/wsl/lib/"
export NUMBA_CUDA_DRIVER="/usr/lib/wsl/lib/libcuda.so.1"
export LD_LIBRARY_PATH=/usr/local/cuda-12.3/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-12.3/bin:$PATH
shwina commented 10 months ago

@Zakk-Yang thanks for sharing your fix!

bdice commented 8 months ago

Closing as resolved.

quaid281 commented 1 week ago

I know this is closed, but I must say that installing CUDA and cuDNN is a nightmare on WSL, and the likelihood of this issue is that CUDF can not find your CUDA. If anyone is facing this issue, best of luck in resolving it.