microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.25k stars 812 forks source link

Failed to initialize NVML: GPU access blocked by the operating system #9938

Open loliq opened 1 year ago

loliq commented 1 year ago

Windows Version

Windows 10 [19045.2728]

WSL Version

1.1.6.0

Are you using WSL 1 or WSL 2?

Kernel Version

Linux version 5.15.90.1-microsoft-standard-WSL2

Distro Version

Ubuntu 22.04

Other Software

No response

Repro Steps

I install wsl2 in Windows 10 [19045.2728] image

In Windows, “nvidia-smi” output is : image

but in wsl2, output is : image

Below are the solutions I have tried that didn't work:

  1. log in with administrator privileges
  2. update the driver to the latest version
  3. install cuda toolkits

My file list in C:\Windows\System32\lxss\lib is :

image

Expected Behavior

that nvidia-smi outputs the informations related to my gpu and that I can use it inside the WSL2 environment.

Actual Behavior

this is what happens when I launch nvidia-smi inside ubuntu 22.04 :

nvidia-smi

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system

Diagnostic Logs

No response

fschvart commented 1 year ago

I have the exact same problem

OneBlue commented 1 year ago

/logs

microsoft-github-policy-service[bot] commented 1 year ago

Hello! Could you please provide more logs to help us better diagnose your issue?

To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The scipt will output the path of the log file once done.

Once completed please upload the output files to this Github issue.

Click here for more info on logging

Thank you!

fschvart commented 1 year ago

Hi, In my case the issue was that WSL doesn't support A100 GPUs

loliq commented 1 year ago

Hi, In my case the issue was that WSL doesn't support A100 GPUs

Thank you very much, I guess it was the problem, I have tried other machine which use 3060 and it works.

loliq commented 1 year ago

Hello! Could you please provide more logs to help us better diagnose your issue?

To collect WSL logs, download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The scipt will output the path of the log file once done.

Once completed please upload the output files to this Github issue.

Click here for more info on logging

Thank you!

Thank you for replying, the attachment is log file. WslLogs-2023-04-12_11-08-54.zip

lpdink commented 1 year ago

I have encountered a similar problem. nvidia-smi works well in wsl2, but it doesn't work properly in the docker container started in wsl2, with error "Failed to initialize NVML: GPU access blocked by the operating system". aebf03e8daa69b32bdc7e609ecc194d8 (1) I use the official image provided by Pytorch and am confident that Docker-ce and nvidia_container_toolkit has been installed correctly. In fact, when I use the same installation script in the Ubuntu system, the GPU in the container works well. Here is my system version info: QQ截图20230413134621 QQ截图20230413134729 Looking forward to your reply, thank you in advance for your help

bsekachev commented 1 year ago

Exactly the same problem like @loliq has. Yesteday worked fine, today does not work anymore.

Windows has updated last night. Installed: KB5025239, KB2267602, KB890830

alf-wangzhi commented 1 year ago

I have encountered a similar problem. nvidia-smi works well in wsl2, but it doesn't work properly in the docker container started in wsl2, with error "Failed to initialize NVML: GPU access blocked by the operating system". aebf03e8daa69b32bdc7e609ecc194d8 (1) I use the official image provided by Pytorch and am confident that Docker-ce and nvidia_container_toolkit has been installed correctly. In fact, when I use the same installation script in the Ubuntu system, the GPU in the container works well. Here is my system version info: QQ截图20230413134621 QQ截图20230413134729 Looking forward to your reply, thank you in advance for your help

I encountered the same problem. Has it been resolved? Does this error mean that GPU cannot be used?

lpdink commented 1 year ago

Yeah,just see: https://github.com/microsoft/WSL/issues/9962 @alf-wangzhi

alf-wangzhi commented 1 year ago

thank you so much. it is means a lot @lpdink

anton-petrov commented 1 year ago

The problem was solved, time to close this issue.

qwqawawow commented 3 months ago

I got this error on my machine

WSL version: 2.2.4.0
version: 5.15.153.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26091.1-240325-1447.ge-release
Windows version: 10.0.22631.3737

and the kernel is 6.6.32-microsoft-standard-WSL2 compiled by myself Any suggestions?

bert-jan commented 3 months ago

I have the same issue with Nvidia A16.

WSL version: 2.1.5.0 version: 5.15.146.1-2 WSLg version: 1.0.60 MSRDC version: 1.2.5105 Direct3D version: 1.611.1-81528511 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.19045.3570

Zhiwei-Zhai commented 1 month ago

I have the same issue, with:

system: windows 10, 22H2, 19045.4651, GPU: Nvidia Tesla v100 Ubuntu-22.04

WSL version: 2.3.13.0
内核version: 6.6.36.3-1
WSLg version: 1.0.63
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26100.1-240331-1435.ge-release
Windows version: 10.0.19045.4651
Rahman2001 commented 1 month ago

Here, I found solution in #9962 that says:

Inside the file /etc/nvidia-container-runtime/config.toml change no-cgroups from true to false

It worked for me. Hope it does for you too.