Open Spirit4471 opened 4 months ago
If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'. Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.
Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!
Note: You can give me feedback by thumbs upping or thumbs downing this comment.
The log file doesn't contain any WSL traces. Please make sure that you reproduced the issue while the log collection was running.
@Spirit4471: the nvidia-smi output seems to be correct, what exactly is the issue here ?
@Spirit4471: the nvidia-smi output seems to be correct, what exactly is the issue here ?
the nvidia-smi output not correct, you can the the CUDA version is ERR!, and print(f"CUDA is available: {torch.cuda.is_available()}") return false.
NVIDIA-SMI 545.46 Driver Version: 546.80 CUDA Version: ERR!
$ nvidia-smi
Sat May 18 02:31:47 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.73.01 Driver Version: 552.12 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GT 1030 On | 00000000:01:00.0 On | N/A |
| 30% 36C P8 N/A / 30W | 1026MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 105 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
You need to update the nvidia GPU Windows drivers.
NVIDIA-SMI 545.46 Driver Version: 546.80 CUDA Version: ERR!
$ nvidia-smi Sat May 18 02:31:47 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.73.01 Driver Version: 552.12 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GT 1030 On | 00000000:01:00.0 On | N/A | | 30% 36C P8 N/A / 30W | 1026MiB / 2048MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 105 G /Xwayland N/A | +-----------------------------------------------------------------------------------------+
You need to update the nvidia GPU Windows drivers.
But when I type nvidia-smi in windows terminal, it output in a right way. When I type nvidia-smi in wsl2, it output ERR! in CUDA version.
The WSL2 version depends on the Windows version. The nvidia-smi ELF64 binary inside WSL2 updates automatically after installing the latest Windows drivers for you GPU.
In fact in WSL2 the folder where nvidia-smi
resides /usr/lib/wsl/lib
is just a mount that points to the DriverStore Windows folder.
That's why even Nvidia themselves recommends:
Install the Windows 11 nvidia display driver. This is the only driver you need to install. Do not install any Linux display driver in WSL.
The WSL2 version depends on the Windows version. The nvidia-smi ELF64 binary inside WSL2 updates automatically after installing the latest Windows drivers for you GPU.
In fact in WSL2 the folder where
nvidia-smi
resides/usr/lib/wsl/lib
is just a mount that points to the DriverStore Windows folder.That's why even Nvidia themselves recommends:
Install the Windows 11 nvidia display driver. This is the only driver you need to install. Do not install any Linux display driver in WSL.
I know, I didn't install CUDA toolkit in wsl2, wsl2 is using the driver on windows. Actually, when I first set up the develop environment, everything works perfectly, after I restart the computer, the develop environment seems got problem, and nvidia-smi command output ERR! in CUDA version.
I would try to install another distro. If nvidia-smi works there without error then the problem could be some Ubuntu 22.04 package update.
I had similar problem with Podman running on WSL2. In my case the problem was fixed by generating CDI spec again after recent driver upgrade. This needs to be done after each GPU driver update on host machine according to the related Nvidia CTK doc page
If you change the device or CUDA driver configuration, you must generate a new CDI specification. A configuration change can occur when MIG devices are created or removed, or when the driver is upgraded.
Windows Version
Microsoft Windows [Version 10.0.22631.3593]
WSL Version
WSL version: 2.1.5.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.146.1
Distro Version
Ubuntu 22.04
Other Software
Visual Studio Code
Repro Steps
CUDA 11.7 PyTorch 1.13 cuDNN
Expected Behavior
Actual Behavior
nvidia-smi output: Wed May 15 21:49:41 2024
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.46 Driver Version: 546.80 CUDA Version: ERR! | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 ... On | 00000000:01:00.0 On | N/A | | N/A 54C P8 16W / 80W | 816MiB / 6144MiB | 3% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
nvcc --version: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
Only the first I configure wsl2+Ubuntu+CUDA+Python+PyTorch development environment, the code worked, after I reboot the computer, my code can't get avaliable cuda and GPU.
Diagnostic Logs
No response