microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.45k stars 822 forks source link

WSL2 & CUDA does not work [v20226] #6014

Closed noofaq closed 4 years ago

noofaq commented 4 years ago

Environment

Windows build number: 10.0.20226.0
Your Distribution version: 18.04 / 20.04
Whether the issue is on WSL 2 and/or WSL 1: Linux version 4.19.128-microsoft-standard (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Tue Jun 23 12:58:10 UTC 2020

Steps to reproduce

Exactly followed instructions available at https://docs.nvidia.com/cuda/wsl-user-guide/index.html Tested on previously working Ubuntu WSL image (IIRC GPU last worked on 20206, than whole WSL2 stopped working) Tested also on newly created Ubuntu 18.04 and Ubuntu 20.04 images.

I have tested CUDA compatible NVIDIA drivers 455.41 & 460.20. I have tried removing all drivers etc. I have also tested using CUDA 10.2 & CUDA 11.0.

It was tested on two separate machines (one Intel + GTX1060, other Ryzen + RTX 2080Ti)

Issue tested directly in OS also in docker containers inside.

Example (directly in Ubuntu):

piotr@DESKTOP-FS6J3NT:/usr/local/cuda/samples/4_Finance/BlackScholes$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Turing" with compute capability 7.5

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
CUDA error at BlackScholes.cu:116 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **)&d_CallResult, OPT_SZ)"

Example in container:

piotr@DESKTOP-FS6J3NT:/mnt/c/Users/pppnn$ docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter python
Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-10-01 14:18:07.538627: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-10-01 14:18:07.624188: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-10-01 14:18:32.359457: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-10-01 14:18:32.398949: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3200035000 Hz
2020-10-01 14:18:32.402692: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3d06b70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-01 14:18:32.402748: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-10-01 14:18:32.409370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-10-01 14:18:32.877228: W tensorflow/compiler/xla/service/platform_util.cc:276] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
2020-10-01 14:18:32.877370: I tensorflow/compiler/jit/xla_gpu_device.cc:136] Ignoring visible XLA_GPU_JIT device. Device number is 0, reason: Internal: no supported devices found for platform CUDA
2020-10-01 14:18:32.879904: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:32.880192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:1d:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.665GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-10-01 14:18:32.880277: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-01 14:18:32.880340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-01 14:18:32.959947: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-01 14:18:32.973554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-01 14:18:33.111736: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-01 14:18:33.127902: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-01 14:18:33.128018: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-01 14:18:33.128535: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:33.129170: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:33.129403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-10-01 14:18:33.131671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/test_util.py", line 1513, in is_gpu_available
    for local_device in device_lib.list_local_devices():
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/device_lib.py", line 43, in list_local_devices
    _convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: all CUDA-capable devices are busy or unavailable
>>>
>>>
>>>
>>>
>>> tf.config.list_physical_devices('GPU')
2020-10-01 14:18:55.610151: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:55.610510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:1d:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.665GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-10-01 14:18:55.610579: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-01 14:18:55.610623: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-01 14:18:55.610676: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-01 14:18:55.610719: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-01 14:18:55.610762: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-01 14:18:55.610805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-01 14:18:55.610846: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-01 14:18:55.611251: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:55.611765: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:18:55.611999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>>
>>>
>>>
>>> tf.test.gpu_device_name()
2020-10-01 14:20:08.762060: W tensorflow/compiler/xla/service/platform_util.cc:276] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
2020-10-01 14:20:08.762222: I tensorflow/compiler/jit/xla_gpu_device.cc:136] Ignoring visible XLA_GPU_JIT device. Device number is 0, reason: Internal: no supported devices found for platform CUDA
2020-10-01 14:20:08.762863: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:20:08.763201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:1d:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.665GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-10-01 14:20:08.763263: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-10-01 14:20:08.763316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-10-01 14:20:08.763358: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-10-01 14:20:08.763379: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-10-01 14:20:08.763428: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-10-01 14:20:08.763480: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-10-01 14:20:08.763533: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-01 14:20:08.763898: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:20:08.764536: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:1d:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-10-01 14:20:08.764810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/test_util.py", line 112, in gpu_device_name
    for x in device_lib.list_local_devices():
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/device_lib.py", line 43, in list_local_devices
    _convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: all CUDA-capable devices are busy or unavailable
>>>

Expected behavior

CUDA working inside WSL2

Actual behavior

All tests which are using CUDA inside WSL Ubuntu are resulting with various CUDA errors - mostly referring to no CUDA devices available.

Agrover112 commented 4 years ago

@wanfuse maybeMaybe

tadam98 commented 4 years ago

I downgraded as part of recovery from 20226 back to 20221 without any problem. Everything worked as before the downgrade and wsl2 started working well.

Just to be on the safe side I made a full system backup beforehand.

If you have your pc for some time and "passed through" 20221 try several cycles or recovery to previous windows version.

If you never had it, e.g. your pc is newer than 2 weeks, you can install 20221 from the iso version keeping all your files. This is a safe procedure.

Nevertheless do not skip the backup. (Both Acronis and EaseUS Todo Backup work fine for me in making and restoring disk clones including all partitions).

You can get the 20221.1000 iso image in one of the links mentioned in previous messages.

Good luck, Mickey

onomatopellan commented 4 years ago

Build 20236 blog post annonuncement

We fixed a regression that was breaking NVIDIA CUDA vGPU acceleration in the Windows Subsystem for Linux. Please see this GitHub thread for full details.

FSchoettl commented 4 years ago

Build 20236 fixed it for me 👍

askourtis commented 4 years ago

Can confirm that it works on 20236

Agrover112 commented 4 years ago

Wait I received the new update too gotta check it out

basarane commented 4 years ago

Cuda on WSL2 works perfectly on build 20236. The problem seems to be resolved.

tadam98 commented 4 years ago

Works for me too:

$ python

import tensorflow as tf tf.test.is_gpu_available() ... 2020-10-15 00:53:25.186406: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 True

$ docker stats $ docker stop # dockers reported by stats $ rm ~/.docker/config.json $ sudo service docker stop Docker already stopped - file /var/run/docker-ssd.pid not found. $ sudo service docker start

Windowed mode Simulation data stored in video memory Single precision floating point simulation 1 Devices used for simulation MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM GPU Device 0: "GeForce RTX 2080 Ti" with compute capability 7.5

Compute 7.5 CUDA device: [GeForce RTX 2080 Ti] 69632 bodies, total time for 10 iterations: 117.762 ms = 411.730 billion interactions per second = 8234.591 single-precision GFLOP/s at 20 flops per interaction

cktlco commented 4 years ago

I confirm the same issue exists in 20231.1000

I confirm the issue is resolved in 20236.1000. Thanks to all who contributed momentum toward this.

wanfuse123 commented 4 years ago

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare -fp64 --works so it seems with output like so Run "nbody -benchmark [-numbodies=]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies= (number of bodies (>= 1) to run in simulation) -device= (where d=0,1,2.... for the CUDA device to use) -numdevices= (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy= (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Windowed mode Simulation data stored in video memory Double precision floating point simulation 1 Devices used for simulation GPU Device 0: "GeForce GTX 1050 Ti" with compute capability 6.1

Compute 6.1 CUDA device: [GeForce GTX 1050 Ti] 4096 bodies, total time for 10 iterations: 115.717 ms = 1.450 billion interactions per second = 43.495 double-precision GFLOP/s at 30 flops per interaction

while this container fails still with the error

docker run --runtime=nvidia --rm -ti -v "${PWD}:/app" nricklin/ubuntu-gpu-test modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.128-microsoft-standard/modules.dep.bin' test.cu(29) : cudaSafeCall() Runtime API error : no CUDA-capable device is detected.

This is with the latest 20231.1000 installed

Lastly the container mentioned below which seems like a very good test shows errors but I think that a kernel compile option that I am unaware of will take care of it. I will investigate it further.

I am posting this second run line for others to use for testing the video card, since nvidia-smi is still not working and everything seems to suggest using nvidia-smi in the containers for examples. (Can't wait for the nvidia-smi thing to be fixed)

Anway here is the command

docker run --runtime=nvidia --rm -ti -v "${PWD}:/app" tensorflow/tensorflow:1.13.2-gpu-py3-jupyter python /app/benchmark.py gpu 10000 (note: I am chaining the nvidia runtime container to a second container) I don't see very many mentions of how to do this without docker-compose. (Helpful?)

it produces output like

Found device 0 with properties: name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392 pciBusID: 0000:01:00.0 totalMemory: 4.00GiB freeMemory: 3.30GiB 2020-10-15 02:16:20.656683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-10-15 02:16:20.658946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-10-15 02:16:20.659002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-10-15 02:16:20.659019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-10-15 02:16:20.659493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.

As you can see there seems to be NUMA missing, this might not be a kernel compile problem but rather something with tensorflow versional problems like mentioned here:

https://stackoverflow.com/questions/55511186/could-not-identify-numa-node-of-platform-gpu (at the very bottom)

I am still investigating. Hope these test containers help someone else.

markdjthomas commented 4 years ago

I’m still running into the all CUDA-capable devices are busy or unavailable error after updating to 20236 with CUDA 460.15 on Ubuntu 20.04.

torch.cuda.is_available() returns True but I can’t move any variables to the device.

tadam98 commented 4 years ago

docker-compose does not support gpu docker containers. In order to use docker containers:

  1. uninstall docker-compose
  2. pip install git+https://github.com/yoanisgil/compose.git@device-requests
  3. Add lines to your docker file:
version: "3.7"
...
    device_requests:
      - capabilities:
        - "gpu"

You may get an error "docker.errors.InvalidVersion: device_requests param is not supported in API versions < 1.40" which is fixed by setting environment variable COMPOSE_API_VERSION=1.40

With this modified docker-compose, gpu parameter will be passed to the docker container.

It is not a Microsoft problem but a docker-composer problem. With the modified docker-compose it works well.

keenranger commented 4 years ago

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare -fp64 --works so it seems with output like so Run "nbody -benchmark [-numbodies=]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies= (number of bodies (>= 1) to run in simulation) -device= (where d=0,1,2.... for the CUDA device to use) -numdevices= (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy= (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Windowed mode Simulation data stored in video memory Double precision floating point simulation 1 Devices used for simulation GPU Device 0: "GeForce GTX 1050 Ti" with compute capability 6.1

Compute 6.1 CUDA device: [GeForce GTX 1050 Ti] 4096 bodies, total time for 10 iterations: 115.717 ms = 1.450 billion interactions per second = 43.495 double-precision GFLOP/s at 30 flops per interaction

while this container fails still with the error

docker run --runtime=nvidia --rm -ti -v "${PWD}:/app" nricklin/ubuntu-gpu-test modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.128-microsoft-standard/modules.dep.bin' test.cu(29) : cudaSafeCall() Runtime API error : no CUDA-capable device is detected.

This is with the latest 20231.1000 installed

Lastly the container mentioned below which seems like a very good test shows errors but I think that a kernel compile option that I am unaware of will take care of it. I will investigate it further.

I am posting this second run line for others to use for testing the video card, since nvidia-smi is still not working and everything seems to suggest using nvidia-smi in the containers for examples. (Can't wait for the nvidia-smi thing to be fixed)

Anway here is the command

docker run --runtime=nvidia --rm -ti -v "${PWD}:/app" tensorflow/tensorflow:1.13.2-gpu-py3-jupyter python /app/benchmark.py gpu 10000 (note: I am chaining the nvidia runtime container to a second container) I don't see very many mentions of how to do this without docker-compose. (Helpful?)

it produces output like

Found device 0 with properties: name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392 pciBusID: 0000:01:00.0 totalMemory: 4.00GiB freeMemory: 3.30GiB 2020-10-15 02:16:20.656683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-10-15 02:16:20.658946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-10-15 02:16:20.659002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-10-15 02:16:20.659019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-10-15 02:16:20.659493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.

As you can see there seems to be NUMA missing, this might not be a kernel compile problem but rather something with tensorflow versional problems like mentioned here:

https://stackoverflow.com/questions/55511186/could-not-identify-numa-node-of-platform-gpu (at the very bottom)

I am still investigating. Hope these test containers help someone else.

I had same trouble til 20231. CUDA might not work with build 20231, try it with build 20236

CUDA works

In my case, it works well after the update

noofaq commented 4 years ago

It is also fixed for me in 20236 so I am closing the issue. Thanks to all commenters and also WSL developers for fixing the issue.

manishkm commented 4 years ago

Yes now cuda on wsl is working in 20236. I am creating a restore point if this issue arises again in future.

hanxiaotian commented 4 years ago

I'm using 20241, but cuda is still not working in mu wsl. Previously, when I use 20176, it worked

import torch torch.cuda.current_device() Traceback (most recent call last): File "", line 1, in File "/home/xiaothan/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/cuda/init.py", line 377, in current_device _lazy_init() File "/home/xiaothan/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/cuda/init.py", line 196, in _lazy_init _check_driver() File "/home/xiaothan/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/cuda/init.py", line 101, in _checkdriver http://www.nvidia.com/Download/index.aspx""") AssertionError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

borgarpa commented 4 years ago

I'm using 20241, but cuda is still not working in mu wsl. Previously, when I use 20176, it worked

import torch torch.cuda.current_device() Traceback (most recent call last): File "", line 1, in File "/home/xiaothan/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/cuda/init.py", line 377, in current_device _lazy_init() File "/home/xiaothan/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/cuda/init.py", line 196, in _lazy_init _check_driver() File "/home/xiaothan/miniconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/cuda/init.py", line 101, in _checkdriver http://www.nvidia.com/Download/index.aspx""") AssertionError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

I think they broke WSL-CUDA again... It did not work for me either. I tried reinstalling WSL, CUDA and NVIDIA driver. But WSL just failed to identify both NVIDIA driver and CUDA.

tadam98 commented 4 years ago

I have just updated to 20241 and

  1. nvidia docker works fine
  2. tensorflow report my GPU 2080ti, as true.
  3. tensorflow reports device name correctly. I suggest you check that maybe your cuda version was updated and no longer matches your TF version. If so, scroll back - there is a procedure how to install any previous cuda version.

For me there is no problem with 20241. Best, Mickey

$ docker ps
$ docker stop # dockers reported by ps
$ rm ~/.docker/config.json
$ sudo service docker stop
$ sudo service docker start
$ sudo mkdir /sys/fs/cgroup/systemd
$ sudo mount -t cgroup -o none,name=systemd cgroup /sys/fs/cgroup/systemd
$ docker ps
$ docker run hello-world
$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
MapSMtoCores for SM 7.5 is undefined.  Default to use 64 Cores/SM
GPU Device 0: "GeForce RTX 2080 Ti" with compute capability 7.5

> Compute 7.5 CUDA device: [GeForce RTX 2080 Ti]
69632 bodies, total time for 10 iterations: 118.318 ms
= 409.797 billion interactions per second
= 8195.939 single-precision GFLOP/s at 20 flops per interaction

The nvidia docker works fine.

And so does Tensorflow

$ python
> import tensorflow as tf
> tf.test.is_gpu_available()
Your kernel may have been built without NUMA support.
2020-10-25 00:21:10.409370: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555df5500de0 executing computations on platform CUDA. Devices:
2020-10-25 00:21:10.409404: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
True
> tf.test.gpu_device_name()
2020-10-25 00:22:14.701525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 9630 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:09:00.0, compute capability: 7.5)
'/device:GPU:0'
binarycrayon commented 4 years ago

Works for me, bu the way i reinstall the Nvidia driver 460 everytime after an update

borgarpa commented 4 years ago

Ok! Solved it! Sorry if I missguided anybody. 😉 My NVIDIA driver and Ubuntu distro got messed up after the update. Re-installing Nvidia and repairing and Ubuntu did the trick:

user@DESKTOP:/usr/local/cuda/samples/4_Finance/BlackScholes$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Pascal" with compute capability 6.1

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 1.083354 msec
Effective memory bandwidth: 73.844778 GB/s
Gigaoptions per second    : 7.384478

BlackScholes, Throughput = 7.3845 GOptions/s, Time = 0.00108 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed
wanfuse123 commented 4 years ago

Can you paste the contents of black Scholes?

Seems like a good test script!

Thanks!

hanxiaotian commented 4 years ago

Ok! Solved it! Sorry if I missguided anybody. 😉 My NVIDIA driver and Ubuntu distro got messed up after the update. Re-installing Nvidia and repairing and Ubuntu did the trick:

user@DESKTOP:/usr/local/cuda/samples/4_Finance/BlackScholes$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Pascal" with compute capability 6.1

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 1.083354 msec
Effective memory bandwidth: 73.844778 GB/s
Gigaoptions per second    : 7.384478

BlackScholes, Throughput = 7.3845 GOptions/s, Time = 0.00108 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed

Which Nividia driver are you using? My Nividia driver version is 460.20 and Ubuntu 20.04, it still didn't work

strarsis commented 4 years ago
msharifian commented 4 years ago

Ok! Solved it! Sorry if I missguided anybody. 😉 My NVIDIA driver and Ubuntu distro got messed up after the update. Re-installing Nvidia and repairing and Ubuntu did the trick:

user@DESKTOP:/usr/local/cuda/samples/4_Finance/BlackScholes$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Pascal" with compute capability 6.1

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 1.083354 msec
Effective memory bandwidth: 73.844778 GB/s
Gigaoptions per second    : 7.384478

BlackScholes, Throughput = 7.3845 GOptions/s, Time = 0.00108 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed

Which Nividia driver are you using? My Nividia driver version is 460.20 and Ubuntu 20.04, it still didn't work

Reinstall Nvidia and Ubuntu, that worked for me with 20241

nu007a commented 4 years ago

Ok! Solved it! Sorry if I missguided anybody. 😉 My NVIDIA driver and Ubuntu distro got messed up after the update. Re-installing Nvidia and repairing and Ubuntu did the trick:

user@DESKTOP:/usr/local/cuda/samples/4_Finance/BlackScholes$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Pascal" with compute capability 6.1

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 1.083354 msec
Effective memory bandwidth: 73.844778 GB/s
Gigaoptions per second    : 7.384478

BlackScholes, Throughput = 7.3845 GOptions/s, Time = 0.00108 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed

Which Nividia driver are you using? My Nividia driver version is 460.20 and Ubuntu 20.04, it still didn't work

Reinstall Nvidia and Ubuntu, that worked for me with 20241

Reinstall NV driver(460.20_gameready_win10-dch_64bit_international) and restart win10. The problem has resovled. win10:20246.1;wsl2:Ubuntu 20.04

tadam98 commented 4 years ago

NV driver 460.20 win10:20246.1;wsl2:Ubuntu 18.04

Today I am getting: $ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. ERRO[0001] error waiting for container: context canceled

Reboots do not help. Reinstalled https://developer.nvidia.com/46020-gameready-win10-dch-64bit-international.exe

Did not help.

Any idea ?

onomatopellan commented 4 years ago

@tadam98 what's the output of ls -la /dev/dxg from bash?

tadam98 commented 4 years ago

My whole computer crashed and windows insider could not boot. So it is back in the shop for a short while getting a disk massage or something.

tadam98 commented 4 years ago

Rebuilt my computer. Now the Windows insider version is 20251.1 (downgrade to 2.3.0 did not help)

On the windows host docker 2.5.0.0 was installed.

$ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

For some reason:

$ sudo service docker stop
docker: unrecognized service

It was not so in my previous system.

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled

And, as requested:

~$ ls -la /dev/dxg
crw-rw-rw- 1 root root 245, 0 Nov  8 05:49 /dev/dxg
onomatopellan commented 4 years ago

@tadam98 That docker you are running is probably the Docker Desktop for Windows which doesn't support GPU in WSL2 yet. What's the output of ls -la $(which docker) ?

tadam98 commented 4 years ago

-rwxr-xr-x 1 root root 89213816 Oct 14 19:52 /usr/bin/docker

docker --version
Docker version 19.03.6, build 369ce74a3c
python
import tensorflow as tf
tf.__version__

'1.14.0' tf.test.is_gpu_available() 2020-11-08 14:32:14.364926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.65 pciBusID: 0000:0a:00.0 True tf.test.gpu_device_name() name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.65 pciBusID: 0000:0a:00.0 2020-11-08 14:32:25.081122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 9630 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:0a:00.0, compute capability: 7.5) '/device:GPU:0'

As can be seen, the GPU us seen in wsl2. I just have the docker problem.

onomatopellan commented 4 years ago

@tadam98 Since that looks like a nvidia docker bug you should post the issue here.

nalpalhao commented 4 years ago

Hey, running on winver 20251.1 and driver 460.20. Following the guide on Cuda on WSL:

/usr/local/cuda/samples/4_Finance/BlackScholes$ sudo make /usr/local/cuda-11.0/bin/nvcc -ccbin g++ -I../../common/inc -m64 -maxrregcount=16 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -o BlackScholes.o -c BlackScholes.cu nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). ptxas warning : For profile sm_70 adjusting per thread register count of 16 to lower bound of 24 ptxas warning : For profile sm_75 adjusting per thread register count of 16 to lower bound of 24 ptxas warning : For profile sm_80 adjusting per thread register count of 16 to lower bound of 24 /usr/local/cuda-11.0/bin/nvcc -ccbin g++ -I../../common/inc -m64 -maxrregcount=16 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -o BlackScholes_gold.o -c BlackScholes_gold.cpp nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). /usr/local/cuda-11.0/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -o BlackScholes BlackScholes.o BlackScholes_gold.o nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). mkdir -p ../../bin/x86_64/linux/release cp BlackScholes ../../bin/x86_64/linux/release

/usr/local/cuda/samples/4_Finance/BlackScholes$ sudo ./BlackScholes [./BlackScholes] - Starting... CUDA error at ../../common/inc/helper_cuda.h:777 code=35(cudaErrorInsufficientDriver) "cudaGetDeviceCount(&device_count)"

Any ideias on how to solve this?

tadam98 commented 4 years ago

If both run you are usually fine. docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

And,

python
import tensorflow as tf
tf.__version__
tf.test.is_gpu_available()
tf.test.gpu_device_name()

In my case the docker fails and the python works. You may need to alter the python commands slightly if the tf version is different.

tadam98 commented 4 years ago

Microsoft just released 20257.1 any opinions ?

ytwytw commented 3 years ago

I have 20257 and getting cudaErrorInsufficientDriver error

tadam98 commented 3 years ago

This makes it work. Basically install docker on wsl2. https://dev.to/bartr/install-docker-on-windows-subsystem-for-linux-v2-ubuntu-5dl7

After restart of wsl2 do the following procedure:

$ docker ps
$ docker ps -aq
$ docker stop $(docker ps -aq)
$ docker stop # dockers reported by ps
$ rm ~/.docker/config.json
$ sudo service docker stop
$ sudo service docker start
$ sudo mkdir /sys/fs/cgroup/systemd
$ sudo mount -t cgroup -o none,name=systemd cgroup /sys/fs/cgroup/systemd
$ docker ps
$ docker run hello-world
$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
GPU Device 0: "GeForce RTX 2080 Ti" with compute capability 7.5

> Compute 7.5 CUDA device: [GeForce RTX 2080 Ti]
69632 bodies, total time for 10 iterations: 117.583 ms
= 412.357 billion interactions per second
= 8247.147 single-precision GFLOP/s at 20 flops per interaction

One more thing you need to know. The usual docker-compose (last one is version 1.27) that gets installed with the docker does not support starting dockers with gpu. The only way to manage docker containers with gpu is using an experimental version of docker-compose that does support management of docker containers that use a gpu.

$ which docker-compose
$ sudo rm rf <folder of docker compose> 
$ pip install git+https://github.com/yoanisgil/compose.git@device-requests
$ docker-compose --version
docker-compose version 1.26.0dev, build unknown
# to use it you have to make sure that API version is 1.40. This can be done on the same command-line:
$ COMPOSE_API_VERSION=1.40 docker-compose up
    NVIDIA_DRIVER_PRESENT: "True"
    container_name: docker_name_needing_gpu
    device_requests:
        - capabilities:
            - "gpu"
tadam98 commented 3 years ago

Windows insider updated to 20262.1 With the update, the regular nvidia driver was re-installed. Not good for wsl2. Also, nvidia issued a new driver that you should install in windows: https://developer.nvidia.com/cuda/wsl The new drive is 465.12 Download and install it. You must reboot after the install or wsl2 will not see the gpu as of yet. Start wsl2

$ python
> import tensorflow as tf
>  tf.test.is_gpu_available()
2020-11-19 17:15:13.395676: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
True

Then do the above docker procedures to check that the docker aslo works. In my case, it works fine.

serg06 commented 3 years ago

Windows insider updated to 20262.1 With the update, the regular nvidia driver was re-installed. Not good for wsl2. Also, nvidia issued a new driver that you should install in windows: https://developer.nvidia.com/cuda/wsl The new drive is 465.12 Download and install it. You must reboot after the install or wsl2 will not see the gpu as of yet. Start wsl2

$ python
> import tensorflow as tf
>  tf.test.is_gpu_available()
2020-11-19 17:15:13.395676: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
True

Then do the above docker procedures to check that the docker aslo works. In my case, it works fine.

After doing this, it still didn't work for me.

I was able to get it to work by following it up with this:

  • Installed everything else that Windows Update wanted to install and rebooted
  • Reinstalled the 465.12 driver
    • On the first screen, I selected "Drivers + GeForce Experience"
    • On the second screen, I selected Custom, then pressed Next and checked "Perform a clean install"
  • Uninstalled all my existing Ubuntu installations
  • Restarted Windows
  • Installed Ubuntu from Microsoft store (the one called "Ubuntu" with no version) (it installed Ubuntu 20.04)
  • Restarted Windows
  • Followed these instructions to add the sources to my sources lists, but I modified the URLs to be 2004 instead of 1804:
    • sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
    • sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
    • sudo apt-get update
  • Continued to follow the instructions by installing Nvidia Toolkit 11.1 (not 11.0)
    • sudo apt-get install -y cuda-toolkit-11-1
  • Followed these instructions to test the installation
    • wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    • bash Miniconda3-latest-Linux-x86_64.sh
    • When it asks whether to init conda, hit yes
    • restart terminal
    • conda config --set auto_activate_base false
    • restart terminal
    • conda create --name directml python=3.6
    • conda activate directml
    • pip install tensorflow-directml
  • Then in Python
    • import tensorflow.compat.v1 as tf
    • tf.enable_eager_execution(tf.ConfigProto(log_device_placement=True))
    • print(tf.add([1.0, 2.0], [3.0, 4.0]))

And there, I finally saw the name of my GPU pop up!

After that I went back and tried the ./BlackScholes test:

  • cd /usr/local/cuda/samples/4_Finance/BlackScholes
  • sudo make
  • ./BlackScholes

And it also worked!

tadam98 commented 3 years ago

Hi @serge,

When you said "After doing this, it still didn't work for me." did you also follow the instruction on the docker one post above? (I am working with Cuda 10.0 and it is working fine (with other examples as 10.0 does not have the BlackScholes. Actually cuda version does not matter at all. Once Python and nvidia docker container see the GPU the rest is details)

Best, Mickey

serg06 commented 3 years ago

Hi @tadam98 , no I didn't try that.

tadam98 commented 3 years ago

@serg06 Interesting.

I see that you are not using the nvidia docker container. The procedure is not needed if you do not use it and do not use docker containers that needד a gpu.

Are you using the Desktop docker exclusively without the addiaional installation of the Ubuntu Docker as in post?

Just at the tensorflow level, tensorflow-gpu works fine form me, without the need for the tensorflow-directml. Since I am still using version 1.14, I had to install cuda 10.x and cudnn 7.6.5 which work very fine.

Best, Mickey

satyajitghana commented 3 years ago

I just updated to 20262.1010 and reinstalled nvidia driver 465.12, and installed Ubuntu 20.04 WSL2, and setup accordingly (used the CUDA WSL2 tutorial and replaced 1804 with 2004), and it just worked ! and even my previous Ubuntu 18.04 started working for CUDA (it wasn't working on older build), Thanks to @serg06 !

alchemistake commented 3 years ago

How can I track this version is merged to the mainline? I don't want to run insider build :)

tadam98 commented 3 years ago

Oh, if you wait another 3-6 months it may be integrated into the main builds.

ArieTwigt commented 3 years ago

Windows insider updated to 20262.1 With the update, the regular nvidia driver was re-installed. Not good for wsl2. Also, nvidia issued a new driver that you should install in windows: https://developer.nvidia.com/cuda/wsl The new drive is 465.12 Download and install it. You must reboot after the install or wsl2 will not see the gpu as of yet. Start wsl2

$ python
> import tensorflow as tf
>  tf.test.is_gpu_available()
2020-11-19 17:15:13.395676: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
True

Then do the above docker procedures to check that the docker aslo works. In my case, it works fine.

After doing this, it still didn't work for me.

I was able to get it to work by following it up with this:

  • Installed everything else that Windows Update wanted to install and rebooted
  • Reinstalled the 465.12 driver

    • On the first screen, I selected "Drivers + GeForce Experience"
    • On the second screen, I selected Custom, then pressed Next and checked "Perform a clean install"
  • Uninstalled all my existing Ubuntu installations
  • Restarted Windows
  • Installed Ubuntu from Microsoft store (the one called "Ubuntu" with no version) (it installed Ubuntu 20.04)
  • Restarted Windows
  • Followed these instructions to add the sources to my sources lists, but I modified the URLs to be 2004 instead of 1804:

    • sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
    • sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
    • sudo apt-get update
  • Continued to follow the instructions by installing Nvidia Toolkit 11.1 (not 11.0)

    • sudo apt-get install -y cuda-toolkit-11-1
  • Followed these instructions to test the installation

    • wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

    • bash Miniconda3-latest-Linux-x86_64.sh

    • When it asks whether to init conda, hit yes

    • restart terminal

    • conda config --set auto_activate_base false

    • restart terminal

    • conda create --name directml python=3.6

    • conda activate directml

    • pip install tensorflow-directml

  • Then in Python

    • import tensorflow.compat.v1 as tf
    • tf.enable_eager_execution(tf.ConfigProto(log_device_placement=True))
    • print(tf.add([1.0, 2.0], [3.0, 4.0]))

And there, I finally saw the name of my GPU pop up!

After that I went back and tried the ./BlackScholes test:

  • cd /usr/local/cuda/samples/4_Finance/BlackScholes
  • sudo make
  • ./BlackScholes

And it also worked!

Thanks for posting your steps. I also run Ubuntu in WSL2 and use the Nvidia docker containers as explained in https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-nvidia-drivers I also faced the same problem (nvcr.io/nvidia/tensorflow not working in WSL 2 after the update of Windows Insider version).

From your steps, I only had to re-install the CUDA driver ( https://developer.nvidia.com/cuda/wsl ). In my case I just had to run the installer like I normally did. I didn't even have to restart WSL.

I ran the benchmark container to check if everything works:

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

The benchmark passed and my nvcr.io/nvidia/tensorflow container works again.

wanfuse123 commented 3 years ago

I can confirm that installing the latest docker edge, Microsoft Edge packages, and NVIDIA 465.12 makes it all work again.

I had stopped working on this for a month because an update killed it,it was badly broken.

Fortunately it is working again.

Even better than before actually because now NVIDIA DOCKER can run in multiple containers simultaneously!!!

wanfuse123 commented 3 years ago

also can confirm that nvidia-smi seems to at least partially work once again!

tadam98 commented 3 years ago

Updated to Insider version 20270.1 still with 465.12 and all is working well as before.

tadam98 commented 3 years ago

@ArieTwigt Not sure how it worked for you as you did not include "experimental" in the .list as in the instructions.

curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list