microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.34k stars 814 forks source link

PyTorch/Tensorflow not recognizing GPU on WSL - installed cudnn and cuda #8102

Closed AH-Merii closed 8 months ago

AH-Merii commented 2 years ago

Version

Microsoft Windows [Version 10.0.22000.527]

WSL Version

Kernel Version

5.10.16

Distro Version

Ubuntu 20.04

Other Software

PyTorch '1.10.2+cu113'

Repro Steps

Expected Behavior

Expected behaviour is for torch.cuda.is_available() and tf.test.is_gpu_available() to return True

Actual Behavior

torch.cuda.is_available() and tf.test.is_gpu_available() return False instead

Diagnostic Logs

Nvidia SMI

❯ nvidia-smi
Sun Feb 27 23:12:27 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.52       Driver Version: 511.79       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   50C    P8    13W /  N/A |   2140MiB /  8192MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Nvidia Cuda Toolkit Version

❯ /usr/local/cuda-11/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

Running Blackscholes Test

❯ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 0.259223 msec
Effective memory bandwidth: 308.614999 GB/s
Gigaoptions per second    : 30.861500

BlackScholes, Throughput = 30.8615 GOptions/s, Time = 0.00026 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed

Running PyTorch torch.cuda.is_available

❯ python
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.10.2+cu113'
>>> torch.cuda.is_available()
False

Running Tensorflow tf.test.is_gpu_available()

❯ python
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'2.8.0'
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2022-02-28 00:13:23.308472: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-28 00:13:23.492434: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-02-28 00:13:23.502191: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-02-28 00:13:23.502241: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
False

Running PyTorch collect_env.py

❯ python collect_env.py
Collecting environment information...
PyTorch version: 1.10.2+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.9.7 (default, Sep 16 2021, 13:09:58)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: 11.4.152
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU
Nvidia driver version: 511.79
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.2
[pip3] torch==1.10.2+cu113
[pip3] torchvision==0.11.3+rocm4.2
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               h2bc3f7f_2
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640
[conda] mkl-service               2.4.0            py39h7f8727e_0
[conda] mkl_fft                   1.3.1            py39hd3c417c_0
[conda] mkl_random                1.2.2            py39h51133e4_0
[conda] numpy                     1.21.2           py39h20f2e39_0
[conda] numpy-base                1.21.2           py39h79a1101_0
[conda] pytorch                   1.10.2          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.10.2               py39_cu113    pytorch
[conda] torchvision               0.11.3               py39_cu113    pytorch

Running Natively on Windows 11

(cuda) PS C:\Users\Abdul Hamid Merii> python
Python 3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.10.2+cu113'
>>> torch.cuda.is_available()
True
elsaco commented 2 years ago

@AH-Merii you might need to install libcudnn8 since it's not found. apt install libcudnn8 on Ubuntu will fix that!

I was facing the same issue and after installing libcudnn8 it worked. Notice the last line:

tux@ubuntu:~$ python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
2022-02-27 18:09:22.945826: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-02-27 18:09:22.949805: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-02-27 18:09:22.950096: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

There's a NUMA warning because on WSL kernel CONFIG_NUMA is not set. I'll build a new WSL kernel with NUMA enabled to see if those warnings will go away.

This is from inside Python:

tux@ubuntu:~$ python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.5.1+cu101'
>>> torch.cuda.is_available()
True
>>>
AH-Merii commented 2 years ago

@elsaco , Just installed libcudnn8, I am still unable to detect the GPU.

abbygriffiths commented 2 years ago

@AH-Merii Try installing the dev version of cudnn too. sudo apt install libcudnn8-dev

jerry871004 commented 2 years ago

@AH-Merii did you solve this problem? i facing the same one as yours

microsoft-github-policy-service[bot] commented 8 months ago

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!