microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.57k stars 823 forks source link

Yet another “Driver Not Loaded / can’t communicate with the NVIDIA driver” error on Windows 21376co_release.210503-1432 #6925

Closed Marietto2008 closed 10 months ago

Marietto2008 commented 3 years ago

Windows Build Number

21376co_release.210503-1432

WSL Version

Kernel Version

5.10.16.3-microsoft-standard-WSL2

Distro Version

ubuntu 20.04

Other Software

Docker version 20.10.6, build 370c289 (installed with sudo apt-get install nvidia-docker2)

Repro Steps

These are the commands that I have issued (taken from here : https://dilililabs.com/zh/blog/2021/01/26/deploying-docker-with-gpu-support-on-windows-subsystem-for-linux/

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub sudo sh -c 'echo "deb

http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list' sudo apt-get update

sudo apt-get install cuda-toolkit-11-0 curl https://get.docker.com | sh distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list

sudo apt-get update

sudo apt-get install nvidia-docker2 cuda-toolkit-11-0 cuda-drivers

sudo service docker start

Expected Behavior

I expect that the nvidia driver can communicate.

Actual Behavior

docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04

Unable to find image 'nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04' locally 11.0-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda 171857c49d0f: Pull complete 419640447d26: Pull complete 61e52f862619: Pull complete 2a93278deddf: Pull complete c9f080049843: Pull complete 8189556b2329: Pull complete c306a0c97a55: Pull complete 4a9478bd0b24: Pull complete 19a76c31766d: Pull complete Digest: sha256:11777cee30f0bbd7cb4a3da562fdd0926adb2af02069dad7cf2e339ec1dad036 Status: Downloaded newer image for nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

IN ADDITION :

root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Failed to properly shut down NVML: Driver Not Loaded

Diagnostic Logs

No response

tianguangye commented 3 years ago

Hello, I encountered the same problem with the same software version as I started installation of wsl2 gpu support 2 days ago on a newly activated window 10 notebook. (WIP Build 21376.co_release.210503-1432).

As the discussions in issue-6773 explained, the preview build 21359 fixes this phenomenon. I am wondering if there's a version in between which is reliable enough and we could check back to in order to use gpu under wsl2?

Marietto2008 commented 3 years ago

which fixes are u using ?

Marietto2008 commented 3 years ago

the workaround is here :

https://github.com/NVIDIA/nvidia-docker/issues/1496#issuecomment-830256689

tianguangye commented 3 years ago

Well I tried the fixes, the same problem remains as you mentioned above when invoking the nvidia-smi command and the docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 container.

Marietto2008 commented 3 years ago

try to follow this,maybe u wrong something because unclear : https://forums.developer.nvidia.com/t/yet-another-driver-not-loaded-cant-communicate-with-the-nvidia-driver-error-while-trying-to-deploy-a-docker-container-with-gpu-support-on-wsl2/177396/2

tianguangye commented 3 years ago

Hello,

I was able to launch a jupyer notebook (tensorflow/tensorflow:latest-gpu-py3-jupyter) under wsl2 ubuntu18.04, and train a classifer with GPU support.

Great thanks for your help!

It seems, however, that the error of 'nvidia-smi' command still exists. Looking forward for a future update of the nvidia driver!

Guangye

microsoft-github-policy-service[bot] commented 10 months ago

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!