microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
16.94k stars 799 forks source link

nvidia-smi segmentation fault in wsl2 but not in Windows #11277

Open themizzi opened 4 months ago

themizzi commented 4 months ago

Windows Version

10.0.22631.3235

WSL Version

2.1.4.0

Are you using WSL 1 or WSL 2?

Kernel Version

5.15.146.1-2

Distro Version

Ubuntu 22.04

Other Software

GeForce GTX 1650 Ti with GeForce Game Ready Driver version 551.76

Repro Steps

Run nvidia-smi in Windows and get the following:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.76                 Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650 Ti   WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   63C    P8              3W /   50W |     163MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      3268    C+G   ...ekyb3d8bbwe\WsaClient\WsaClient.exe      N/A      |
|    0   N/A  N/A     18112    C+G   ...ience\NVIDIA GeForce Experience.exe      N/A      |
+-----------------------------------------------------------------------------------------+

Run nvidia-smi in wsl2 Ubuntu and get the following:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 551.76       CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
[1]    2058 segmentation fault  nvidia-smi

Expected Behavior

I am expecting no segmentation fault and successful output in WSL 2.

Actual Behavior

I get a segmentation fault in WSL2 as described above.

Diagnostic Logs

No response

github-actions[bot] commented 4 months ago

Hi I'm an AI powered bot that finds similar issues based off the issue title.

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

themizzi commented 4 months ago

Interestingly, glxinfo does not report my nvidia GPU:

❯ glxinfo | grep "Device"
Device: D3D12 (Intel(R) UHD Graphics) (0xffffffff)
terrificobjects commented 4 months ago

I am also seeing this issue. Interestingly- the nvidia-smi version is different on mine:

NVIDIA-SMI 545.29.06 Driver Version: 551.61 CUDA Version: 12.4

and glxinfo returns the same output as the poster above.

jaubourg commented 3 months ago

I have the exact same issue.

Environment

WSL version: 2.1.4.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3155

Distro is Ubuntu-22.04.

nvidia-smi in windows gives this:

Tue Mar 12 02:47:51 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.76                 Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650 Ti   WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   60C    P0             15W /   50W |       0MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 2070 ...  WDDM  |   00000000:40:00.0 Off |                  N/A |
|  0%   52C    P8             23W /  215W |     713MiB /   8192MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    1   N/A  N/A      2360    C+G   ...8.0_x64__cv1g1gvanyjgm\WhatsApp.exe      N/A      |
|    1   N/A  N/A      5168    C+G   ...CBS_cw5n1h2txyewy\TextInputHost.exe      N/A      |
|    1   N/A  N/A      8560    C+G   ...2txyewy\StartMenuExperienceHost.exe      N/A      |
|    1   N/A  N/A      8980    C+G   ...nt.CBS_cw5n1h2txyewy\SearchHost.exe      N/A      |
|    1   N/A  N/A     13284    C+G   ...__8wekyb3d8bbwe\WindowsTerminal.exe      N/A      |
|    1   N/A  N/A     15352    C+G   ...Brave-Browser\Application\brave.exe      N/A      |
+-----------------------------------------------------------------------------------------+

But inside Ubuntu under WSL2:

Tue Mar 12 02:47:41 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
Segmentation fault

Everything was working fine last week. I didn't install anything new in Ubuntu, I only updated nvidia drivers in Windows and I highly suspect that's the problem. I sadly don't remember which version of the drivers I had before since I hadn't updated for quite some time (not that I know how to downgrade drivers to test my theory).

This is highly blocking, I need CUDA for my daily work.

jaubourg commented 3 months ago

Also, lspci returns this:

3a32:00:00.0 3D controller: Microsoft Corporation Device 008e
3aca:00:00.0 3D controller: Microsoft Corporation Device 008e
b0ae:00:00.0 3D controller: Microsoft Corporation Device 008e
c7e3:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio console (rev 01)
cec6:00:00.0 System peripheral: Red Hat, Inc. Virtio file system (rev 01)

I dunno if this is normal or if the two NVidia cards should be reported as actually NVidia.

terrificobjects commented 3 months ago

Just to update- I thought this segmentation fault item was causing an issue, but I am using a Tesla P4 and data center drivers. I was unable to use pytorch or anything without rebooting into safe mode, running DDU and clean installing my drivers. I still had issues after reinstalling drivers, so I went into WSL and removed all nvidia and cuda packages, rebooted/DDU/clean reinstalled one more time, and now I can use Cuda like normal. I still see regular nvidia-smi output in Powershell but segmentation fault in WSL- but I can still run all my applications.

Just in case someone misidentifies this as a different issue they are having, like I did.

jaubourg commented 3 months ago

I use CUDA inside docker images launched within wsl2's Ubuntu and the graphics card are not found while it worked before so the issue is clearly not limited to nvidia-smi in my personal case. Just to be extra-precise too. And it worked flawlessly before and I didn't change anything inside Ubuntu.

zcobol commented 3 months ago

@themizzi if glxinfo shows an Intel Device and you want the Nvidia one, set MESA_D3D12_DEFAULT_ADAPTER_NAME=nvidia environment variable and run the command again. Or just run MESA_D3D12_DEFAULT_ADAPTER_NAME=nvidia glxinfo -B if you don't want to make it permanent.

glxinfo output using default settings:

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (Intel(R) UHD Graphics 750) (0xffffffff)

and with MESA_D3D12_DEFAULT_ADAPTER_NAME=nvidia set:

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce RTX 4070 SUPER) (0xffffffff)

Looks like nvidia-smi crashes when using the GDR driver. It doesn't trigger when using the SD driver:

Tue Mar 12 17:24:45 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   33C    P8              3W /  220W |     540MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
GitHubUserC commented 3 months ago

I'm running on exactly the same environment and experiencing the same problems as @themizzi .

mwkldeveloper commented 3 months ago

Screenshot 2024-03-13 150048 the same here: segmentation fault in wsl2 but not in Windows

elsaco commented 3 months ago

@themizzi which nvidia-smi are you running inside WSL? The correct one is part of the Nvidia Windows driver installation and is mounted in WSL under /usr/lib/wsl/lib. What is the output of command -v nvidia-smi in WSL?

Output using Game Ready Driver ver. 551.67:

Wed Mar 13 09:48:09 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   32C    P8              3W /  220W |     431MiB /  12282MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Notice the nvidia-smi version, it's 550.60.01

Rui-K commented 3 months ago

same issue but even less information in WSL2, nothing printed out rather than Segmentation fault, not even heads including version

nocturneatfiftyhz commented 3 months ago

Same problem here.. nvidia-smi works fine on Win11, but gives Segmentation fault on WSL2


C:\Users\X>nvidia-smi
Fri Mar 15 01:12:09 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.76                 Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650      WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8              4W /   50W |       0MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

C:\Users\X>wsl.exe
x@DESKTOP-NHNBGBN:/mnt/c/Users/X$ nvidia-smi
Fri Mar 15 01:12:28 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
Segmentation fault
jmangelson commented 3 months ago

I am seeing the same behavior.

Do we know if this is a problem with a specific driver?

Rui-K commented 3 months ago

same issue but even less information in WSL2, nothing printed out rather than Segmentation fault, not even heads including version

Update: Did nothing, reboot my computer, without trying nvidia-smi in windows, I directly tried it in WSL, worked with no error.

GitHubUserC commented 3 months ago

same issue but even less information in WSL2, nothing printed out rather than Segmentation fault, not even heads including version

Update: Did nothing, reboot my computer, without trying nvidia-smi in windows, I directly tried it in WSL, worked with no error.

Don't work for me @Rui-K

jmangelson commented 3 months ago

I still see the segfault.

However, if I run nvidia-smi.exe from within WSL it displays correctly.

Additionally, if I try running programs that use CUDA they do run.

elsaco commented 3 months ago

@jaubourg you're just launching a Windows executable from within WSL, which is a PE blob from /mnt/c/Windows/system32/nvidia-smi.exe. The issue is with the Linux version of nvidia-smi which is mounted under /usr/lib/wsl/lib from Windows, however it's a different binary.

[elsaco@texas ~]$ file /mnt/c/Windows/system32/nvidia-smi.exe
/mnt/c/Windows/system32/nvidia-smi.exe: PE32+ executable (console) x86-64, for MS Windows, 7 sections
[elsaco@texas ~]$ file /usr/lib/wsl/lib/nvidia-smi
/usr/lib/wsl/lib/nvidia-smi: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=db77481740c9f1334e47d8f2ffde53b34b2bc0dc, stripped

nvidia-smi utility is not using any of the CUDA libraries. What is the output of lld -v /usr/lib/wsl/lib/nvidia-smi? Are there any unresolved libs?

This is the output on my system:

[elsaco@texas ~]$ ldd -v /usr/lib/wsl/lib/nvidia-smi
        linux-vdso.so.1 (0x00007ffcde975000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fce8e56a000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fce8e489000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fce8e484000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fce8e2a2000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fce8e29d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fce8e577000)

        Version information:
        /usr/lib/wsl/lib/nvidia-smi:
                librt.so.1 (GLIBC_2.2.5) => /lib64/librt.so.1
                libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
                libpthread.so.0 (GLIBC_2.3.2) => /lib64/libpthread.so.0
                libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
                libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        /lib64/libpthread.so.0:
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        /lib64/libm.so.6:
                ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
                libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
        /lib64/libdl.so.2:
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        /lib64/libc.so.6:
                ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
                ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
                ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
        /lib64/librt.so.1:
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
                libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6

and both Windows and Linux nvidia-smi work.

zivchen9993 commented 3 months ago

Had the same issue and it kept me up for very long... the thing that fixed it for me was uninstalling the Nvidia driver (which I updated to 551.76 ) and installing an older one (NOT 551.61 which also didn't work), 537.58 from October 2023 in my case but it was pretty random choice. (I have the MX550 so the drivers correspond to that GPU model). hope it will help you as well.

AlexTo commented 3 months ago

I think it is the issue with NVIDIA 551 driver. It works for me with the previous NVIDIA 537 but after upgrading, I got segmentation fault in WSL2 as well.

Triple-Z commented 3 months ago

same, downgrade to 537 solve my problem.

nocturneatfiftyhz commented 3 months ago

I uninstalled the NVIDIA driver and installed v537.58 as advised in the last few days, and the Segmentation fault on WSL2 disappeared. Thanks for the replies guys!

GitHubUserC commented 3 months ago

I uninstalled the NVIDIA driver and installed v537.58 as advised in the last few days, and the Segmentation fault on WSL2 disappeared. Thanks for the replies guys!

same, downgrade to 537 solve my problem.

asasine commented 3 months ago

Tried a few different versions and it seems like everything 538+ is broken.

eyabesbes commented 3 months ago

I uninstalled the NVIDIA driver and installed v537.58 as advised in the last few days, and the Segmentation fault on WSL2 disappeared. Thanks for the replies guys!

same, downgrade to 537 solve my problem.

Hi, I'm having the same issue on WSL2 but not on Windows11 how can I downgrade to 537 ?

Tue Apr 9 20:40:24 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.10 Driver Version: 551.61 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| Segmentation fault

elsaco commented 3 months ago

@eyabesbes you have to uninstall the current Nvidia Windows driver then install version 537. Nvidia WSL libraries are part of the Windows installer.

Nvidia studio drivers seems to work okay:

[elsaco@texas ~]$ nvidia-smi
Tue Apr  9 15:57:32 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.65                 Driver Version: 551.86         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A4000               On  |   00000000:61:00.0  On |                  Off |
| 41%   34C    P8             10W /  140W |     610MiB /  16376MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
CaptainRui1000 commented 3 months ago

good news: find solution. bad news: RTX 4080 super has no previous driver to downgrade. sad

maxzaikin commented 2 months ago

Having exact same issue here: nvidia-smi (Windows output) `Sat Apr 13 04:58:35 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 551.86 Driver Version: 551.86 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================+======================| | 0 NVIDIA T1200 Laptop GPU WDDM | 00000000:01:00.0 Off | N/A | | N/A 50C P0 12W / 45W | 0MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================| | No running processes found | +-----------------------------------------------------------------------------------------+`

nvidia-smi(Ubuntu WSL2) +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.65 Driver Version: 551.86 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===================+========================+======================| Segmentation fault

I have Win-11 Pro with the latest updates.

Does anybody figured-out how to fix this issue? I think it is root cause of that my tensorflow doesn't see GPU... sad

maxzaikin commented 2 months ago

So far here is my solution: as of Apr-13-2024 I have downgraded from 551.86-quadro-rtx-desktop-notebook-win10-win11-64bit-international-dch-whql (BUILT on CUDA 12.4) down to 536.96-quadro-rtx-desktop-notebook-win10-win11-64bit-international-dch-whql (BUILT on CUDA 12.2) and segmentation fault error dissapeared. Seems to me problem is that Windows Ubuntu WSL2 image does not support latest NVIDIA gears.

NickAcPT commented 2 months ago

I'm also facing a segfault only inside WSL2 but not when running it through Windows.

WSL version: 2.2.2.0
Kernel version: 5.15.150.1-2
Windows version: 10.0.22631.3447
Driver Version: 551.86 (GRD)
CUDA Version: 12.4

Haven't tried downgrading drivers just yet.

EDIT: Can't replicate the workaround mentioned in https://github.com/microsoft/WSL/issues/11277#issuecomment-2046172125 . Using studio driver still causes it to segfault no matter what.

waarrk commented 2 months ago

The same error occurred. Nvidia RTX A500 Notebook GPU. When I downgraded from R550 U5 (552.22) to R535 U11 (538.33), the segmentation fault disappeared.

JaneConan commented 2 months ago

same error GPU 1

NVIDIA GeForce RTX 3050 Ti Laptop GPU

驱动程序版本: 31.0.15.5222
驱动程序日期: 2024/4/11
DirectX 版本: 12 (FL 12.1)
物理位置:   PCI 总线 243、设备 0、功能 0

利用率 0%
专用 GPU 内存   0.0/4.0 GB
共享 GPU 内存   0.0/15.9 GB
GPU 内存  0.0/19.9 GB

image

image

msk-nightly commented 2 months ago

Same issue. Using the Game Ready Driver for GTX 960M on Windows 10 x64 version 22H2.

Running nvidia-smi in CMD on Windows outputs:

Thu May 2 17:05:06 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 552.22 Driver Version: 552.22 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 960M WDDM | 00000000:01:00.0 Off | N/A | | N/A 0C P8 N/A / 200W | 83MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

Running nvidia-smi in Ubuntu on WSL2 outputs:

Thu May 2 16:48:10 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.76.01 Driver Version: 552.22 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| Segmentation fault

Running nvidia-smi.exe in Ubuntu on WSL2 outputs:

Thu May 2 17:02:05 2024 +-----------------------------------------------------------------------------------------+ NVIDIA-SMI 552.22 Driver Version: 552.22 CUDA Version: 12.4 -----------------------------------------+------------------------+----------------------+ GPU Name TCC/WDDM Bus-Id Disp.A Volatile Uncorr. ECC Fan Temp Perf Pwr:Usage/Cap Memory-Usage GPU-Util Compute M. MIG M. =========================================+========================+====================== 0 NVIDIA GeForce GTX 960M WDDM 00000000:01:00.0 Off N/A N/A 0C P8 N/A / 200W 105MiB / 4096MiB 0% Default
N/A

+-----------------------------------------+------------------------+----------------------+

gnaaromat commented 2 months ago

As said before just install the "normal" windows 537 driver or earlier. The Download Page has older driver versions available. If you try to install cuda for windows, it will bump the driver to the latest version. In my case, I didn't have to purge the windows driver first, just running the installer for 537.xx (it doesn't matter which one Nvidia displays for you).

Same issue. Using the Game Ready Driver for GTX 960M on Windows 10 x64 version 22H2.

Running nvidia-smi in CMD on Windows outputs:

Thu May 2 17:05:06 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 552.22 Driver Version: 552.22 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 960M WDDM | 00000000:01:00.0 Off | N/A | | N/A 0C P8 N/A / 200W | 83MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

Running nvidia-smi in Ubuntu on WSL2 outputs:

Thu May 2 16:48:10 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.76.01 Driver Version: 552.22 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| Segmentation fault

Running nvidia-smi.exe in Ubuntu on WSL2 outputs:

Thu May 2 17:02:05 2024 +-----------------------------------------------------------------------------------------+ NVIDIA-SMI 552.22 Driver Version: 552.22 CUDA Version: 12.4 -----------------------------------------+------------------------+----------------------+ GPU Name TCC/WDDM Bus-Id Disp.A Volatile Uncorr. ECC Fan Temp Perf Pwr:Usage/Cap Memory-Usage GPU-Util Compute M. MIG M. =========================================+========================+====================== 0 NVIDIA GeForce GTX 960M WDDM 00000000:01:00.0 Off N/A N/A 0C P8 N/A / 200W 105MiB / 4096MiB 0% Default
N/A

+-----------------------------------------+------------------------+----------------------+

msk-nightly commented 2 months ago

@gnaaromat Thank you for the advice. I did notice it before but still wanted to report the issue. After trying the fix, it worked immediately! I hope some fix is released for the latest versions though.

suzshiro1024 commented 1 month ago

Same Issue. Using Windows 10 Home 22H2, NVIDIA GeForce GTX 1650 Max-Q Design, Game Ready Driver 552.44 and CUDA 12.4. However, using CUDA 12.1 in WSL2 in order to use PyTorch.


+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76.01              Driver Version: 552.44         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
Segmentation fault
suzshiro1024 commented 1 month ago

Sorry, here's a measure that might solve the problem. I missed it. I'll give it a try. Thank you.

As said before just install the "normal" windows 537 driver or earlier. The Download Page has older driver versions available. If you try to install cuda for windows, it will bump the driver to the latest version. In my case, I didn't have to purge the windows driver first, just running the installer for 537.xx (it doesn't matter which one Nvidia displays for you).

Same issue. Using the Game Ready Driver for GTX 960M on Windows 10 x64 version 22H2. Running nvidia-smi in CMD on Windows outputs:

Thu May 2 17:05:06 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 552.22 Driver Version: 552.22 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 960M WDDM | 00000000:01:00.0 Off | N/A | | N/A 0C P8 N/A / 200W | 83MiB / 4096MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

Running nvidia-smi in Ubuntu on WSL2 outputs:

Thu May 2 16:48:10 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.76.01 Driver Version: 552.22 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| Segmentation fault

Running nvidia-smi.exe in Ubuntu on WSL2 outputs:

Thu May 2 17:02:05 2024 +-----------------------------------------------------------------------------------------+ NVIDIA-SMI 552.22 Driver Version: 552.22 CUDA Version: 12.4 -----------------------------------------+------------------------+----------------------+ GPU Name TCC/WDDM Bus-Id Disp.A Volatile Uncorr. ECC Fan Temp Perf Pwr:Usage/Cap Memory-Usage GPU-Util Compute M. MIG M. =========================================+========================+====================== 0 NVIDIA GeForce GTX 960M WDDM 00000000:01:00.0 Off N/A N/A 0C P8 N/A / 200W 105MiB / 4096MiB 0% Default
N/A

+-----------------------------------------+------------------------+----------------------+

suzshiro1024 commented 1 month ago

I pulled down to Game Ready 537.13 and it worked. Thank you.


+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.103                Driver Version: 537.13       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1650 ...    On  | 00000000:02:00.0 Off |                  N/A |
| N/A   48C    P8               3W /  30W |    128MiB /  4096MiB |     15%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        34      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+
Zhoneym commented 1 month ago

https://forums.developer.nvidia.com/t/geforce-gtx960m-nvidia-smi-segmentation-fault-in-wsl2/294822

Cafta commented 1 month ago

I was having exactly the same problem. I solved it by creating a '.wslconfig' file in 'C:\Users\\' with: [wsl2] memory=4GB processors=2 gpu=true

Zhoneym commented 1 month ago

I was having exactly the same problem. I solved it by creating a '.wslconfig' file in 'C:\Users' with: [wsl2] memory=4GB processors=2 gpu=true

This method is invalid

elsaco commented 1 month ago

@Cafta wsl: Unknown key 'wsl2.gpu' What version of WSL are you using to have that syntax valid?

Pointer0111 commented 1 month ago

I pulled down to Game Ready 537.13 and it worked. Thank you.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.103                Driver Version: 537.13       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1650 ...    On  | 00000000:02:00.0 Off |                  N/A |
| N/A   48C    P8               3W /  30W |    128MiB /  4096MiB |     15%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        34      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

It works perfectly, thank you!

Zhoneym commented 1 month ago

This is a problem that has not been solved since 2024 (538 and subsequent versions) until now

The 537.96 version (Geforce Vulkan BETA, supports CUDA12.2 at most) works normally. The download link is attached

https://developer.nvidia.com/downloads/vulkan-beta-53796-windows )

To add:I found that only the desktop versions of GTX 1050Ti , RTX 2000 , RTX 3050 and RTX 4070 do not have this issue with me. I have asked Linux kernel development engineers, and they said that this issue needs to be fixed.

mattybaus commented 1 month ago

Dell XPS15 9500 - WSL2 - Nvidia GTX1650TI - GRD537.13 - Ubuntu 24.04 works

Zhoneym commented 1 month ago

Dell XPS15 9500 - WSL2 - Nvidia GTX1650TI - GRD537.13 - Ubuntu 24.04 works

This is a problem that has not been solved since 2024 (538 and subsequent versions) until now

kziemski commented 3 weeks ago

Has anyone installed 24.04, nvidia driver 555.x and not broken cuda and docker desktop?

gnaaromat commented 3 weeks ago

Has anyone installed 24.04, nvidia driver 555.x and not broken cuda and docker desktop?

I've even tried a direct boot with 24.04 and ran into several issues until I reverted back to 22. Better wait for 24 to be fully supported I guess

Zhoneym commented 3 weeks ago

Has anyone installed 24.04, nvidia driver 555.x and not broken cuda and docker desktop?

I've even tried a direct boot with 24.04 and ran into several issues until I reverted back to 22. Better wait for 24 to be fully supported I guess

This has nothing to do with what Linux distribution you use, and it is almost certain that this issue will occur regardless of whether you are using any version of Ubuntu, Debian, Rocky, or ArchLinux

gnaaromat commented 3 weeks ago

Or with the fact that cuda just isn't available for 24 yet... It's a driver nightmare one is bound to run into. But this isn't exactly on topic. All I am saying is Ubuntu 24 might also cause issues in WSL and cuda does work with Ubuntu and windows Nvidia drivers <538. Granted, someone else here has Ubuntu 24 running so WSL Ubuntu version probably does not matter as much :)