Closed noofaq closed 3 years ago
Did you reinstall nvidia-docker2 after the rollback?
sudo apt-get update
sudo apt-get install -y --reinstall nvidia-docker2
Oh, and did you sudo service docker start
?
It was anyway the first time install of the nvidia-docker2. here is what I did:
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
ubuntu18.04
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
OK
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Do not skip the experimental
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
# restart the docker desktop (WSL2 it is on the PC)
# actually - rebooted the PC.
Here is what I did now:
mickey@MICKEY-2080TI:~$ sudo apt-get update -y
Hit:1 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Ign:2 http://ppa.launchpad.net/videolan/stable-daily/ubuntu bionic InRelease
Hit:3 http://dl.google.com/linux/chrome/deb stable InRelease
Err:4 http://ppa.launchpad.net/videolan/stable-daily/ubuntu bionic Release
404 Not Found [IP: 91.189.95.83 80]
Hit:5 http://packages.microsoft.com/repos/vscode stable InRelease
Err:6 http://debian.sourcegear.com/ubuntu bionic InRelease
403 Forbidden [IP: 52.216.186.90 80]
Hit:7 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:8 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Ign:9 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease
Hit:10 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release
Get:11 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:13 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Hit:14 https://download.docker.com/linux/ubuntu bionic InRelease
Hit:15 https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64 InRelease
Hit:16 https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/amd64 InRelease
Hit:17 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 InRelease
Hit:18 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 InRelease
Hit:19 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 InRelease
Hit:20 https://packages.lunarg.com/vulkan bionic InRelease
Reading package lists... Done
E: The repository 'http://ppa.launchpad.net/videolan/stable-daily/ubuntu bionic Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Failed to fetch http://debian.sourcegear.com/ubuntu/dists/bionic/InRelease 403 Forbidden [IP: 52.216.186.90 80]
E: The repository 'http://debian.sourcegear.com/ubuntu bionic InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/libnvidia-container-experimental.list:1 and /etc/apt/sources.list.d/nvidia-container-runtime.list:1
W: Target Translations (en) is configured multiple times in /etc/apt/sources.list.d/libnvidia-container-experimental.list:1 and /etc/apt/sources.list.d/nvidia-container-runtime.list:1
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/libnvidia-container-experimental.list:1 and /etc/apt/sources.list.d/nvidia-container-runtime.list:1
W: Target Translations (en) is configured multiple times in /etc/apt/sources.list.d/libnvidia-container-experimental.list:1 and /etc/apt/sources.list.d/nvidia-container-runtime.list:1
mickey@MICKEY-2080TI:~$ sudo apt-get install -y --reinstall nvidia-docker2
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
cuda-11-1 cuda-command-line-tools-11-1 cuda-compiler-11-1 cuda-cudart-11-1 cuda-cudart-dev-11-1 cuda-cuobjdump-11-1
cuda-cupti-11-1 cuda-cupti-dev-11-1 cuda-demo-suite-11-1 cuda-documentation-11-1 cuda-driver-dev-11-1 cuda-gdb-11-1
cuda-libraries-11-1 cuda-libraries-dev-11-1 cuda-memcheck-11-1 cuda-nsight-11-1 cuda-nsight-compute-11-1
cuda-nsight-systems-11-1 cuda-nvcc-11-1 cuda-nvdisasm-11-1 cuda-nvml-dev-11-1 cuda-nvprof-11-1 cuda-nvprune-11-1
cuda-nvrtc-11-1 cuda-nvrtc-dev-11-1 cuda-nvtx-11-1 cuda-nvvp-11-1 cuda-runtime-11-1 cuda-samples-11-1
cuda-sanitizer-11-1 cuda-toolkit-11-1 cuda-tools-11-1 cuda-visual-tools-11-1 golang-docker-credential-helpers
libcublas-11-1 libcublas-dev-11-1 libcufft-11-1 libcufft-dev-11-1 libcurand-11-1 libcurand-dev-11-1 libcusolver-11-1
libcusolver-dev-11-1 libcusparse-11-1 libcusparse-dev-11-1 libnpp-11-1 libnpp-dev-11-1 libnvjpeg-11-1
libnvjpeg-dev-11-1 nsight-compute-2020.2.0 nsight-systems-2020.3.4 python-backports.ssl-match-hostname
python-cached-property python-certifi python-chardet python-docker python-dockerpty python-dockerpycreds
python-docopt python-funcsigs python-functools32 python-jsonschema python-mock python-openssl python-pbr
python-requests python-texttable python-urllib3 python-websocket python-yaml
mickey@MICKEY-2080TI:~$
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 8 not upgraded.
Need to get 0 B/5912 B of archives.
After this operation, 0 B of additional disk space will be used.
(Reading database ... 205537 files and directories currently installed.)
Preparing to unpack .../nvidia-docker2_2.5.0-1_all.deb ...
Unpacking nvidia-docker2 (2.5.0-1) over (2.5.0-1) ...
Setting up nvidia-docker2 (2.5.0-1) ...
mickey@MICKEY-2080TI:~$ **sudo service docker start**
* Starting Docker: docker mickey@MICKEY-2080TI:~$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
mickey@MICKEY-2080TI:~$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled
mickey@MICKEY-2080TI:~$
And, checking the gpu under tensorflow works fine (see the end): (blurmvp3.7g) mickey@MICKEY-2080TI:~$ python Python 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.
import tensorflow as tf /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/mickey/miniconda3/envs/blurmvp3.7g/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) tf.version '1.14.0' tf.test.is_gpu_available() 2020-10-08 17:05:37.772456: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-10-08 17:05:38.022317: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3493440000 Hz 2020-10-08 17:05:38.034860: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56364a3d9840 executing computations on platform Host. Devices: 2020-10-08 17:05:38.034907: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0):
, 2020-10-08 17:05:38.064028: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2020-10-08 17:05:38.560420: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node Your kernel may have been built without NUMA support. 2020-10-08 17:05:38.560712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.65 pciBusID: 0000:09:00.0 2020-10-08 17:05:38.577694: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2020-10-08 17:05:39.146697: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2020-10-08 17:05:39.302286: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0 2020-10-08 17:05:39.355665: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0 2020-10-08 17:05:40.012276: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0 2020-10-08 17:05:40.299079: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0 2020-10-08 17:05:41.381062: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2020-10-08 17:05:41.381777: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node Your kernel may have been built without NUMA support. 2020-10-08 17:05:41.382683: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node Your kernel may have been built without NUMA support. 2020-10-08 17:05:41.382892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2020-10-08 17:05:41.393734: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2020-10-08 17:05:41.741039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-10-08 17:05:41.741080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2020-10-08 17:05:41.741116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2020-10-08 17:05:41.742250: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node Your kernel may have been built without NUMA support. 2020-10-08 17:05:41.742524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1409] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2020-10-08 17:05:41.743173: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node Your kernel may have been built without NUMA support. 2020-10-08 17:05:41.743881: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node Your kernel may have been built without NUMA support. 2020-10-08 17:05:41.744115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 9630 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:09:00.0, compute capability: 7.5) 2020-10-08 17:05:41.758000: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56364e3c7ac0 executing computations on platform CUDA. Devices: 2020-10-08 17:05:41.758038: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 True
I executed a very heavy ML process on this environment and GPU perfect.
I executed a very heavy GPU/ML in this environment and it works perfect. My only problem is that the docker complains:
mickey@MICKEY-2080TI:~$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled
I did - but ... it does not always work:
$ sudo service docker stop
$ sudo service docker start
$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM
GPU Device 0: "GeForce RTX 2080 Ti" with compute capability 7.5
> Compute 7.5 CUDA device: [GeForce RTX 2080 Ti]
69632 bodies, total time for 10 iterations: 112.230 ms
= 432.026 billion interactions per second
= 8640.519 single-precision GFLOP/s at 20 flops per interaction
This solved it - works every time:
$ sudo service docker stop
$ sudo service docker star
$ sudo mkdir /sys/fs/cgroup/systemd
$ sudo mount -t cgroup -o none,name=systemd cgroup /sys/fs/cgroup/systemd
$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
$ docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter
Seems this issue has been tagged "fix inbound" for quite a while, but is not even mentioned in the 20231 release note(not even in the known issues part).
Could more attention be paid to this issue? This is the main blocker for me as a CUDA programmer.
@FluorineDog sweet summer child, getting an MS response at all is an anomaly, and 3 days old is milliseconds (been waiting on https://github.com/microsoft/WSL/issues/4150 for 1yr). I anticipate staying in this rolled back build for 3-6mo. See this for disabling auto update/restart (though not sure if Dev Channel will auto update if you've rolled back). Anyway, definitely do the rollback soon, and don't twiddle your thumbs on a fix.
@FluorineDog "fixinbound" means the fix will appear in an incoming insider build in 2~4 weeks. Once you see "fixedininsiderbuilds" tag will mean the bug is fixed in latest insider build.
I hope so, @onomatopellan !
@FluorineDog sweet summer child, getting an MS response at all is an anomaly, and 3 days old is milliseconds (been waiting on #4150 for 1yr). I anticipate staying in this rolled back build for 3-6mo. See this for disabling auto update/restart (though not sure if Dev Channel will auto update if you've rolled back). Anyway, definitely do the rollback soon, and don't twiddle your thumbs on a fix.
Previous builds in the near past trigger another fatal bug, while ancient ones don't even deliver the needed functionality. That's why I'm checking this thread constantly. Once I can run my first CUDA program, I'll freeze my dev channel gladly, but sadly not now.
I have the same problem in 20226. My build also contains same 8 files in lxss\lib. But I get cudaErrorDevicesUnavailable. Is there a way to roll back 20221? Using "Go back to previous version of Windows 10" sends me to 19041.508.
Yes, you can install 20221 from https://www.microsoft.com/en-us/software-download/windowsinsiderpreviewadvanced
I'm trying to downgrade but can't find a way, in the provided link version 20221 is not in the multiselect at the bottom. any tips on how to downgrade?
@michelemoretti They updated latest official ISO with build 20231 only. You can still generate the x64 ISO with the build you like with sites like https://uup.rg-adguard.net or https://uupdump.ml/
+1, happens to me on 20231 as well.
Same problem
@michelemoretti They updated latest official ISO with build 20231 only. You can still generate the x64 ISO with the build you like with sites like https://uup.rg-adguard.net or https://uupdump.ml/
I've installed both 20226 and 20231. Later I realized that CUDA failed on WSL2. I cannot revert to 20221, only 20226. Is it safe to install 20221 from the ISO downloaded form these sites on 20226?
It used to be here: https://www.microsoft.com/en-us/software-download/windowsinsiderpreviewadvanced But now 20221 is not there any more. I "paused" updates for 7 days hoping that in the mean time microsoft will fix the nvidia problem.
This is the link I used. Unfortunately I did not keep the image. https://software-download.microsoft.com/db/Windows10_InsiderPreview_Client_x64_en-us_20201.iso?t=316defb4-045b-4f87-82cb-e2e201cdca3a&e=1602073124&h=698ebdb68b3a19ab77b28256c9a826b2
updated to preview build 20231, seems like still this issue is not solved.
@tadam98 That link won't work anymore.
@basarane I found a better place to download official insider build 20201 ISO in https://tb.32767.ga/get.php?id=1727 Just make sure you are logged on Windows Insider site before pressing Confirm button.
20221.1000 is the version you want. I have it installed and is working well. It does suffer the reported problem that WSL2 looses internet once in a while (and a reboot is needed). but nvidia-dowcek2 works well on it.
Build 20201 should be a good build stop too. CUDA in WSL2 works since build 20145.
That's before my time :) All insider versions are here: https://tb.32767.ga/products.php?prod=win10ip
@tadam98 That's server version only. Official client Insider ISOs are 20201 and 20231. See flight hub.
Same issue for me on Build 20231, WSL2 and GTX 1650:
root@LAPTOP:/usr/local/cuda/samples/4_Finance/BlackScholes# ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Turing" with compute capability 7.5
Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
CUDA error at BlackScholes.cu:116 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **)&d_CallResult, OPT_SZ)"
No GPU is listed after issuing the command lspci on Ubuntu 18.04:
978b:00:00.0 3D controller: Microsoft Corporation Device 008e
ab50:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
abff:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
ae20:00:00.0 3D controller: Microsoft Corporation Device 008e
bf2d:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
The Tensorflow function list_local_devices returns:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13820027289368611552
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 11825873724132199309
physical_device_desc: "device: XLA_CPU device"
]
It's three days I continue trying different solutions but seems there's none apart from downgrading... I get the same error as the above posts either with cuda-toolkit-11-0 and cuda toolkit-11-1 with my small nvidia GeForce MX150.
The error constantly thrown up is (for the .BlackScholes example given in the CUDA on WSL guide https://docs.nvidia.com/cuda/wsl-user-guide/index.html) is:
CUDA error at BlackScholes.cu:116 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **)&d_CallResult, OPT_SZ)"
I bookmarked this page and following updates daily!
Same issue here. If you use the Build 20231, I don't suggest to downgrade because it introduces a new error about user accounts. If you want to try, please consider backup.
Same issue here on the 20231 version , this issue needs to be fixed soon!
Piling on. When I run this python code on build 20231 (WSL2, ubuntu 20.04, RTX 2080 Super, nvidia driver 460.20) I get the all CUDA-capable devices are busy or unavailable
error.
import torch
torch.rand(500,500,500).cuda()
Going back to build 20201 fixed this issue.
@mitchellvitez I really don't think going back to 20201 is safe move considering previous bugs in 20201 as well
I hope this is fixed soon enough
I downgraded the version to 20221, the WSL 2 I installed showed the error message โremote procedure call failedโ.
@ccs96307 Now what can we say phew.. solving 1 problem leads to another
Same problem on 20231. Cuda samples error out, for example matrixMul:
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Turing" with compute capability 7.5
MatrixA(320,320), MatrixB(640,320)
CUDA error at matrixMul.cu:130 code=46(cudaErrorDevicesUnavailable) "cudaMallocHost(&h_A, mem_size_A)"
Build 20201 should be a good build stop too. CUDA in WSL2 works since build 20145.
How can you work on Build 20201 since 20201 will have "remote procedure call error " when you start up the WSL2 ?
20221 works well for me. Wsl2 looses network once in a while but I can live with it.
I paused the updates for 7 days.
Build 20231 doesn't work for me
How can you work on Build 20201 since 20201 will have "remote procedure call error " when you start up the WSL2 ?
@tommywu052 Wasn't 20211 the build that had that issue #5907? Anyway is hard to say a build where everything works for everyone. In my case I never had that issue nor this thread's issue.
I did not encounter #5907 may because I use ubuntu 18.04.
I am successfully working with 20221.1000.
I'm also having this problem on 20231 WSL Ubuntu 20.04. Just in case anyone wants to save time by not trying this as I just did try installing as a separate distribution Ubuntu 18.04, installed CUDA and rebuilt out from source the examples, and attempted to run BlackScholes. It seems that doesn't make a difference - same error.
Since I just moved to the dev channel today for CUDA support, I don't have the luxury of rolling back. I also have been dealing with a problem with CUDA on my Linux install, so I was hoping this would be worth the effort (guess not). Hope this is fixed soon.... it seems installing CUDA is an absolute crap experience always.
I have the same bleeping problem running cuda
docker run --runtime=nvidia --rm -ti -v "${PWD}:/app" nricklin/ubuntu-gpu-test modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/4.19.128-microsoft-standard/modules.dep.bin' test.cu(29) : cudaSafeCall() Runtime API error : no CUDA-capable device is detected.
I wonder if the module problem is causing the CUDA-capable error
From what I understand you cant run the headers in the container, does MS provide these headers in some other fashion. Seems to be broken for me for at least a week (I did not notice)
Cant roll back to 20221.1000 to test unfortunately, hope this is fixed soon...
231.1005 is available now, will it fix this problem? https://blogs.windows.com/windows-insider/2020/10/07/announcing-windows-10-insider-preview-build-20231/
According to the page seems to not fix anything
231.1005 is available now, will it fix this problem? https://blogs.windows.com/windows-insider/2020/10/07/announcing-windows-10-insider-preview-build-20231/
According to the page seems to not fix anything
No, 20231.1005 still doesn't work
They mentioned this problem: "CUDA on WSL Guide by Nvidia" Currently only suggested solution is reverting back to 20221 build. Is there any safe and easy way or at least any guide to do it?
Ok, so what is the safest way to revert to build 20221?
It's not available via the official methods: reverting to the previous build or through advanced options in the insiders panel.
Do we need to install through a third party ISO? If so which ones are safe?
Also, have we had any official comment from a WSL maintainer?
Thanks ๐
Is there any workaround since the build 20221 image is no longer available here?
Or any ideas when this will likely be fixed? ๐
You can find the ISO for 2021 here check in the comments @theothings @jamespacileo
https://forums.developer.nvidia.com/t/code-46-error-device-unreachable/156739
Here in the comments you will find link(unofficial) because 2021 was unavailable on Windows Insiders Downloads menu(shows later version for downloads)!
Is there any workaround since the build 20221 image is no longer available here?
Or any ideas when this will likely be fixed? ๐
you can downlad and create previous build version of windows iso including 20221 at https://uupdump.ml
Wont installing 20021 through boot and reinstall destroy installations of Ubuntu on WSL and all the configurations of nvidia, and docker, I know you can reinstall and preserve applications but are these included?
Environment
Steps to reproduce
Exactly followed instructions available at https://docs.nvidia.com/cuda/wsl-user-guide/index.html Tested on previously working Ubuntu WSL image (IIRC GPU last worked on 20206, than whole WSL2 stopped working) Tested also on newly created Ubuntu 18.04 and Ubuntu 20.04 images.
I have tested CUDA compatible NVIDIA drivers 455.41 & 460.20. I have tried removing all drivers etc. I have also tested using CUDA 10.2 & CUDA 11.0.
It was tested on two separate machines (one Intel + GTX1060, other Ryzen + RTX 2080Ti)
Issue tested directly in OS also in docker containers inside.
Example (directly in Ubuntu):
Example in container:
Expected behavior
CUDA working inside WSL2
Actual behavior
All tests which are using CUDA inside WSL Ubuntu are resulting with various CUDA errors - mostly referring to no CUDA devices available.