rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
6.03k stars 285 forks source link

CUDA support (WSL) #3968

Open claudio4 opened 1 year ago

claudio4 commented 1 year ago

Problem Description

At the moment it seems like Rancher Desktop for Windows does not support Nvidida CUDA. I have tried both, the containerd and the dockerd engines.

Executing nerdctl run --rm --gpus all nvidia/cuda:12.0.1-devel-ubuntu22.04 nvidia-smi fails with:

> nerdctl run --rm --gpus all nvidia/cuda:12.0.1-devel-ubuntu22.04 nvidia-smi
FATA[0000] exec: "nvidia-container-cli": executable file not found in $PATH

Meanwhile dockerd complais about the lack of a driver.

> docker run --rm --gpus all  nvidia/cuda:12.0.1-devel-ubuntu22.04 nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Proposed Solution

Rancher Desktop needs to check wether CUDA is avalible in the WSL or not, if it's available it should intall the NVIDIA Container Toolkit in rancher's WSL distro.

Additional Information

This might be tricky as the NVIDIA Container Toolkit does not Alpine, I attemped to manually install the toolkit in the rancher-desktop distro to no avail. The main roadblock it's the lack of glibc. gcompat looks promising, I got the toolkit running but only for it to complain about the lack of CUDA.

rne1223 commented 1 year ago

I believe that the current problem comes from the fact that Racher Destop's docker runs on top of busybox and all the drivers that Nvidia has put out are based on Ubuntu. Is there a way to run Rancher Desktop's docker daemon on Ubuntu 20.04?

jandubois commented 1 year ago

I believe that the current problem comes from the fact that Racher Destop's docker runs on top of busybox and all the drivers that Nvidia has put out are based on Ubuntu.

Yes. Alpine uses musl and Ubuntu uses glibc. You can install a glibc compatibility library on Alpine, but I don't know if this will give you CUDA.

The best I can find is https://arto.s3.amazonaws.com/notes/cuda. If you try this out and get CUDA running with Rancher Desktop, then please leave a note here with what you did!

Is there a way to run Rancher Desktop's docker daemon on Ubuntu 20.04?

No, this is not possible. Rancher Desktop makes specific assumptions about the VM images being used; they are custom-built for Rancher Desktop.

shikanime commented 1 year ago

Is there any contribution documentation I can consult to estimate the possibility of adding support for Ubuntu as an alternative backend to Busybox?

RadicalAcronym commented 8 months ago

I would also like to see Rancher Desktop on windows support GPUs. I am able to do this with Podman Desktop. For Podman Desktop I opened a shell prompt in the running container, e.g., podman-machine-default, and ran the following:

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo |   sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo yum install -y nvidia-container-toolkit
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
nvidia-ctk cdi list

I wanted to try that with Rancher Desktop, but I didn't work get it to work -- I suppose in part because Rancher Desktop is based on an alpine linux image which uses musl. I found a glibc package and installed it, but ran into a problem that I can't remember now. Maybe it was that I still didn't have a nvidia-container-toolkit installation for alpine. I decided to keep using podman for now.

It seems there are a few possible solutions: (1) get nvidia to write a container-toolkit for alpine linux, (2) find the right way to install glibc and nvidia-container-toolkit in alpine, (3) change Rancher Desktop to use e.g., debian-slim.

jandubois commented 8 months ago

It seems there are a few possible solutions: (1) get nvidia to write a container-toolkit for alpine linux,

This seems quite unlikely.

(2) find the right way to install glibc and nvidia-container-toolkit in alpine,

This would be the best short/medium term plan. I don't know if this is possible at all, but worth trying.

(3) change Rancher Desktop to use e.g., debian-slim.

This is not going to happen in the medium term (i.e. in 2024). I don't want to rule it out completely, but it would be a significant effort to do it right, and there is a lot of internal refactoring needed before we would attempt this.

choigawoon commented 4 months ago

i'd like to make kubernetes environment with rancher desktop`s k3s but it makes me use docker desktop. nowadays, gpu really needed. really sad.

pradhyumna85 commented 1 month ago

@jandubois, Have a look at this project: https://github.com/sgerrand/alpine-pkg-glibc Also see this related thread : https://github.com/sgerrand/alpine-pkg-glibc/issues/199

I think this could help us add support for GPU (cuda)

Let me know what you think.

brian316 commented 3 weeks ago

no gpu supported? sadly ran in to this trying to move away from docker desktop

RadicalAcronym commented 3 weeks ago

For now until this is fixed, I have installed docker within wsl.

This doesn't provide me with the GUI and doesn't let me see my images/containers/etc. from windows, but it allows for use of docker and GPUs in containers in WSL without Rancher Desktop or Docker Desktop (or Podman Desktop).

From within WSL, uninstall old docker conflicting packages, setup apt repo, install the latest version of docker (see https://docs.docker.com/engine/install/ubuntu/ for exact commands).

Within WSL, assure you see your GPU (can do before docker install).

nvidia-smi

should return something like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10    Driver Version: 535.86.10    CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Then install nvidia-container-toolkit using commands found here (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).... configure repo with curl, apt-get update, apt install nvidia-container-toolkit. Then, configure with an nvidia-ctk command and a systemctl restart docker command. (see link for exact commands).

Then the command in WSL sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi should return the an nvidia-smi response similar to the above.

(Actually, on some of my machines, it shows a seg fault after the nvidia-sim printout, but I can access the GPUs fine.)