Closed djbielejeski closed 2 years ago
Hi, it looks to me as if the backend container is having problems finding your graphics card. Try this and let me know the output: Open a terminal prompt into your default WSL distro, and type this: nvidia-smi Can you post the output? It should look something like this:
nick@FARSPACE:/mnt/s/Projects/nick-stable-diffusion$ nvidia-smi Sun Sep 4 18:31:45 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 516.94 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:0A:00.0 On | Off | | 41% 32C P8 22W / 480W | 16285MiB / 24564MiB | 3% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1 C /python3.10 N/A | +-----------------------------------------------------------------------------+
Can you let me know your OS, your GPU's make/model, CPU make/model and memory?
If nvidia-smi is an unknown command then have a read here about installing the NVidia drivers on WSL2
I am running Windows 10, EVGA 3070, AMD 5950x, 64gb DDR4. Here is the output
C:\Users\David\Dev\nick-stable-diffusion>nvidia-smi
Sun Sep 4 12:38:40 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 516.94 Driver Version: 516.94 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:2D:00.0 On | N/A |
| 0% 50C P8 20W / 100W | 1186MiB / 8192MiB | 45% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
That's good! WSL can see the GPU.
Next, see if you can start this comtainer with this command
docker run --gpus all -it nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 /bin/bash
you should be get a bash prompt, in which case repeat the command:
nvidia-smi
In my case:
PS S:\Projects\nick-stable-diffusion> docker run --gpus all -it nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 /bin/bash root@2bf83e4ee145:/# nvidia-smi Sun Sep 4 17:44:25 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 516.94 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:0A:00.0 On | Off | | 41% 32C P8 22W / 480W | 6382MiB / 24564MiB | 2% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1 C /python3.10 N/A | +-----------------------------------------------------------------------------+ root@2bf83e4ee145:/#
Does that work?
That one also looks like it blows up
C:\Users\David\Dev\nick-stable-diffusion>docker run --gpus all -it nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 /bin/bash
Unable to find image 'nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04' locally
11.7.1-cudnn8-devel-ubuntu22.04: Pulling from nvidia/cuda
d19f32bd9e41: Already exists
e41c57a38d59: Already exists
339e43a75ab8: Already exists
ab109f3e6da8: Already exists
6c10428f3e1b: Already exists
3aa878388cfd: Already exists
292462f05f9d: Already exists
7d28d91cdea6: Already exists
2b6ae3eb24b2: Already exists
8e9b6e224ece: Already exists
Digest: sha256:71cb81d8169c3a486fb865e8cd8dee8d3c09b27f350dd0a3223069e55574ea69
Status: Downloaded newer image for nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: signal: segmentation fault, stdout: , stderr:: unknown.
Right OK, it may well be an incompatibility with the latest NVidia code inside the container image, and your Windows or drivers set up. Looking at Docker Hub there are earlier editions that the ...22.04 I chose for the image:
https://hub.docker.com/r/nvidia/cuda
...but most refer to CUDA 11.7.1
It might be that you try some 'earlier' versions in the list such as Ubuntu 20.04 using the command with year substituted for XX such as 20:
docker run --gpus all -it nvidia/cuda:11.7.1-cudnn8-devel-ubuntuXX.04 /bin/bash
.. until one works. Once that container image is up and running, we can try you changing the FROM in the backend's Dockerfile.
If it doesn't work, then we can see if we can go back to slightly earlier versions of CUDA, although your nvidia-smi reports that it can see CUDA 11.7 drivers.
Alas, Sunday dinner beckons (I am in the UK) but I can pick this up later - let me know how you get on.
Hi @djbielejeski any update? I'm keen to fix this issue and your help will be invaluable.
Here is for 20.04
C:\Users\David\Dev\StableDiffusion\nick-stable-diffusion>docker run --gpus all -it nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 /bin/bash
Unable to find image 'nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04' locally
11.7.1-cudnn8-devel-ubuntu20.04: Pulling from nvidia/cuda
3b65ec22a9e9: Pull complete
9bfa49b064c8: Pull complete
cde16ef91ac2: Pull complete
978ea3dcd5fb: Pull complete
6c10428f3e1b: Pull complete
25e1b86ea3b6: Extracting [============> ] 281.3MB/1.086GB
ab995ea0d0d0: Download complete
25e1b86ea3b6:
daf4a0e65bea: Download complete
25e1b86ea3b6: Pull complete
ab995ea0d0d0: Pull complete
25c7edfc13eb: Pull complete
daf4a0e65bea: Pull complete
200f3698d6ff: Pull complete
Digest: sha256:1870b4ecbb90c9d11baf8aad83efa446bcc72dca0c8ec4230a21dbd2244999c3
Status: Downloaded newer image for nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: signal: segmentation fault, stdout: , stderr:: unknown.
I think the issue is that I can't run these images on Windows. The nvidia/cuda documentation states I need to have the nvidia-container-toolkit installed, but that is not supported on Windows.
Hi @djbielejeski thanks for the update. It certainly should work on Windows 10 and 11, so here's something new to try:
On a PowerShell prompt run this command:
wsl --status
Here is the response from my Windows 11 PC:
PS C:\Users\nick> wsl --status Default Distribution: Ubuntu Default Version: 2 Windows Subsystem for Linux was last updated on 28/07/2022 WSL automatic updates are on. Kernel version: 5.10.102.1
Please can you share what you see? If Default version is 1 or Kernel version is lower than 5.xx.xxx.x this may be the issue and you can upgrade these yourself.
Looks like I am getting this.
C:\Users\David\Dev\StableDiffusion\nicolai256>wsl --status
Default Distribution: docker-desktop
Default Version: 2
Windows Subsystem for Linux was last updated on 4/6/2022
WSL automatic updates are on.
Kernel version: 5.10.102.1
So that matches what you have.
I am going to try installing Ubuntu and setting that to my default distribution.
Hi @djbielejeski - did your installation of Ubuntu make a difference? If it did not, could you go to the original stable diffusion repo (linked to at the top of the README.md) and see if you can get that running? If you can get that version going but not mine, the fault is something I have missed, so would I would be keen to know either way - and hey thanks for your patience!
Hi @djbielejeski any news? I'll mark the issues as closed if not. Many thanks
Hey @nicklansley, no luck on my end. I think we can close since I think its a local computer problem and not specific to your repo. Thanks for all your help trying to troubleshoot with me 👍
@djbielejeski no problem at all! If you do find the answer let me know!
Hey Nick,
I am getting this issue when running
docker compose up -d --build
C:\Users\David\Dev\nick-stable-diffusion>docker compose up -d --build [+] Building 1.0s (74/74) FINISHED => [nick-stable-diffusion_frontend internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 32B 0.0s => [nick-stable-diffusion_sd-backend internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 32B 0.0s => [nick-stable-diffusion_scheduler internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 32B 0.0s => [nick-stable-diffusion_frontend internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [nick-stable-diffusion_scheduler internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [nick-stable-diffusion_sd-backend internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [nick-stable-diffusion_frontend internal] load metadata for docker.io/library/python:3.10-alpine 0.7s => [nick-stable-diffusion_scheduler internal] load metadata for docker.io/library/redis:7.0.2-alpine3.16 0.7s => [nick-stable-diffusion_sd-backend internal] load metadata for docker.io/nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 0.6s => [nick-stable-diffusion_sd-backend 1/38] FROM docker.io/nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04@sha256:71cb81d8169c3a486fb865e8cd8dee8d3c09b27f350dd0a3223069e55574ea69 0.0s => [nick-stable-diffusion_sd-backend internal] load build context 0.0s => => transferring context: 10.52kB 0.0s => [nick-stable-diffusion_scheduler 1/11] FROM docker.io/library/redis:7.0.2-alpine3.16@sha256:5916c280afae05baf0dc9a0cc82fa8e51477bdbfc72f60a5c14fd2b7735bcf07 0.0s => [nick-stable-diffusion_scheduler internal] load build context 0.0s => => transferring context: 124B 0.0s => CACHED [nick-stable-diffusion_scheduler 2/11] COPY redis.conf /usr/local/etc/redis/redis.conf 0.0s => CACHED [nick-stable-diffusion_scheduler 3/11] COPY redis.conf /data/db/dummy.txt 0.0s => CACHED [nick-stable-diffusion_scheduler 4/11] RUN chmod a+rwx -R /data/db && rm /data/db/dummy.txt 0.0s => CACHED [nick-stable-diffusion_scheduler 5/11] RUN apk add nodejs npm python3 py3-pip 0.0s => CACHED [nick-stable-diffusion_scheduler 6/11] RUN npm i redis http-server 0.0s => CACHED [nick-stable-diffusion_scheduler 7/11] WORKDIR /app 0.0s => CACHED [nick-stable-diffusion_scheduler 8/11] COPY ./requirements.txt /app/requirements.txt 0.0s => CACHED [nick-stable-diffusion_scheduler 9/11] RUN pip3 install -r requirements.txt 0.0s => CACHED [nick-stable-diffusion_scheduler 10/11] COPY run.sh /app 0.0s => CACHED [nick-stable-diffusion_scheduler 11/11] COPY scheduler.py /app 0.0s => [nick-stable-diffusion_frontend 1/12] FROM docker.io/library/python:3.10-alpine@sha256:0c46c7f15ee201a2e2dc3579dbc302f989a20b1283e67f884941e071372eb2cc 0.0s => [nick-stable-diffusion_frontend internal] load build context 0.0s => => transferring context: 530B 0.0s => CACHED [nick-stable-diffusion_sd-backend 2/38] WORKDIR /app 0.0s => CACHED [nick-stable-diffusion_sd-backend 3/38] RUN apt-get update && apt-get install -y python3-pip 0.0s => CACHED [nick-stable-diffusion_sd-backend 4/38] RUN apt-get install -y git python3-setuptools curl ffmpeg libsm6 libxext6 0.0s => CACHED [nick-stable-diffusion_sd-backend 5/38] RUN curl https://sh.rustup.rs -sSf | bash -s -- -y 0.0s => CACHED [nick-stable-diffusion_sd-backend 6/38] RUN echo 'export PATH="$HOME/.cargo/env:$PATH"' >> $HOME/.bashrc 0.0s => CACHED [nick-stable-diffusion_sd-backend 7/38] RUN pip3 install --upgrade pip 0.0s => CACHED [nick-stable-diffusion_sd-backend 8/38] RUN pip3 install "jax[cuda11_cudnn82]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html 0.0s => CACHED [nick-stable-diffusion_sd-backend 9/38] RUN pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 0.0s => CACHED [nick-stable-diffusion_sd-backend 10/38] RUN pip3 install --upgrade jupyter 0.0s => CACHED [nick-stable-diffusion_sd-backend 11/38] RUN pip3 install --upgrade cython 0.0s => CACHED [nick-stable-diffusion_sd-backend 12/38] RUN pip3 install --upgrade setuptools-rust 0.0s => CACHED [nick-stable-diffusion_sd-backend 13/38] RUN pip3 install --upgrade matplotlib 0.0s => CACHED [nick-stable-diffusion_sd-backend 14/38] RUN pip3 install --upgrade opencv-python==4.5.4.60 0.0s => CACHED [nick-stable-diffusion_sd-backend 15/38] RUN pip3 install --upgrade more_itertools~=8.12.0 0.0s => CACHED [nick-stable-diffusion_sd-backend 16/38] RUN pip3 install --upgrade youtokentome~=1.0.6 0.0s => CACHED [nick-stable-diffusion_sd-backend 17/38] RUN pip3 install --upgrade omegaconf>=2.0.0 0.0s => CACHED [nick-stable-diffusion_sd-backend 18/38] RUN pip3 install --upgrade einops~=0.3.2 0.0s => CACHED [nick-stable-diffusion_sd-backend 19/38] RUN pip3 install --upgrade segmentation-models-pytorch==0.1.3 0.0s => CACHED [nick-stable-diffusion_sd-backend 20/38] RUN pip3 install --upgrade PyWavelets==1.1.1 0.0s => CACHED [nick-stable-diffusion_sd-backend 21/38] RUN PATH="$HOME/.cargo/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" && pip3 install --upgrade transformers~=4.19.2 # was 4.10.2 0.0s => CACHED [nick-stable-diffusion_sd-backend 22/38] RUN pip3 install --upgrade albumentations==0.4.3 0.0s => CACHED [nick-stable-diffusion_sd-backend 23/38] RUN pip3 install --upgrade diffusers 0.0s => CACHED [nick-stable-diffusion_sd-backend 24/38] RUN pip3 install --upgrade pudb==2019.2 0.0s => CACHED [nick-stable-diffusion_sd-backend 25/38] RUN pip3 install --upgrade invisible-watermark 0.0s => CACHED [nick-stable-diffusion_sd-backend 26/38] RUN pip3 install --upgrade imageio==2.9.0 0.0s => CACHED [nick-stable-diffusion_sd-backend 27/38] RUN pip3 install --upgrade imageio-ffmpeg==0.4.2 0.0s => CACHED [nick-stable-diffusion_sd-backend 28/38] RUN pip3 install --upgrade pytorch-lightning==1.4.2 0.0s => CACHED [nick-stable-diffusion_sd-backend 29/38] RUN pip3 install --upgrade omegaconf==2.1.1 0.0s => CACHED [nick-stable-diffusion_sd-backend 30/38] RUN pip3 install --upgrade test-tube>=0.7.5 0.0s => CACHED [nick-stable-diffusion_sd-backend 31/38] RUN pip3 install --upgrade streamlit>=0.73.1 0.0s => CACHED [nick-stable-diffusion_sd-backend 32/38] RUN pip3 install --upgrade einops==0.3.0 0.0s => CACHED [nick-stable-diffusion_sd-backend 33/38] RUN pip3 install --upgrade torch-fidelity==0.3.0 0.0s => CACHED [nick-stable-diffusion_sd-backend 34/38] RUN pip3 install --upgrade kornia==0.6 0.0s => CACHED [nick-stable-diffusion_sd-backend 35/38] RUN pip3 install --upgrade torchmetrics==0.6.0 0.0s => CACHED [nick-stable-diffusion_sd-backend 36/38] RUN pip3 install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers 0.0s => CACHED [nick-stable-diffusion_sd-backend 37/38] RUN pip3 install -e git+https://github.com/openai/CLIP.git@main#egg=clip 0.0s => CACHED [nick-stable-diffusion_sd-backend 38/38] COPY . /app 0.0s => [nick-stable-diffusion_sd-backend] exporting to image 0.2s => => exporting layers 0.0s => => writing image sha256:387fd6959ede65cca503fc7b7a082c87d88a3bbc5b0197d04d095a342f66c6ed 0.0s => => naming to docker.io/library/nick-stable-diffusion_scheduler 0.0s => => writing image sha256:a6140e464d4d9fc75a679ca6a208a56f8f0dd5614854f608ecd4c1a2f646ad31 0.0s => => naming to docker.io/library/nick-stable-diffusion_frontend 0.0s => => writing image sha256:fce5ff84654ac64f2be6fa040babb0bf253bce3a07051145784096a1dd1c6662 0.0s => => naming to docker.io/library/nick-stable-diffusion_sd-backend 0.0s => CACHED [nick-stable-diffusion_frontend 2/12] RUN apk update && apk upgrade && apk add --no-cache curl 0.0s => CACHED [nick-stable-diffusion_frontend 3/12] WORKDIR /app 0.0s => CACHED [nick-stable-diffusion_frontend 4/12] COPY ./requirements.txt /app/requirements.txt 0.0s => CACHED [nick-stable-diffusion_frontend 5/12] RUN pip3 install -r requirements.txt 0.0s => CACHED [nick-stable-diffusion_frontend 6/12] COPY .html /app 0.0s => CACHED [nick-stable-diffusion_frontend 7/12] COPY .js /app 0.0s => CACHED [nick-stable-diffusion_frontend 8/12] COPY .py /app 0.0s => CACHED [nick-stable-diffusion_frontend 9/12] COPY .css /app 0.0s => CACHED [nick-stable-diffusion_frontend 10/12] COPY .png /app 0.0s => CACHED [nick-stable-diffusion_frontend 11/12] COPY .ico /app 0.0s => CACHED [nick-stable-diffusion_frontend 12/12] COPY site.webmanifest /app 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them [+] Running 2/3