[Bug] Docker Script does not detect GPU

D34DC3N73R commented 1 year ago

Describe the bug The docker test conditions fail even when nvidia gpus are properly installed and available in docker.

$ ./docker-run.sh
 OS/Arch:           linux/amd64
  OS/Arch:          linux/amd64
Docker Compose version v2.2.2
Docker could not find your NVIDIA GPU

I'm not sure exactly what this is searching for in nvidia-smi egrep -e 'NVIDIA.*On'

$ docker run -it  --gpus=all --rm nvidia/cuda:11.4.2-base-ubuntu20.04 nvidia-smi
Fri Oct 21 20:26:41 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2000        Off  | 00000000:09:00.0 Off |                  N/A |
| 51%   44C    P0    18W /  75W |   3863MiB /  5057MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

To Reproduce Steps to reproduce the behavior:

git clone
cd stablediffusion-infinity/docker
./docker-run.sh
See error

Expected behavior I would expect any output from nvidia-smi would be sufficient.

Screenshots N/A

Desktop (please complete the following information):

OS: Ubuntu Server 20.04
Browser N/A

tpsjr7 commented 1 year ago

that output from nvidia smi looks fine, in your case, you can just delete that check from that script in your case so it continues on to the app. I'll think about a better way to detect for other users in general. On my machine for comparison looks like this as below, so it was simply finding those gpus. I see you have a quadro don't see why it wouldn't work, although that amount of GPU Memory 3863MiB / 5057MiB looks kind of low, not sure what the minimum amount of vram is supported, so you might try closing all your open windows programs too ( like your webbrower ) to try to free up memory, if it seems like you're getting out of memory errors down the line.

` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.57 Driver Version: 516.59 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A | | 0% 53C P8 28W / 350W | 2284MiB / 24576MiB | 15% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A | | 0% 50C P8 7W / 151W | 0MiB / 8192MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ `

D34DC3N73R commented 1 year ago

The difference is driver persistence mode is enabled. I'm not sure if it's required here or not, but it seemed to run fine without driver persistence mode enabled. I tried enabling persistence mode, but the check still fails with a quadro GPU. I changed the check to egrep -e 'NVIDIA-SMI' and it ran fine.

In regards to memory, I was also running deepstack, compreface, and some other apps making use of the GPU when I ran nvidia-smi. It's a headless server, so GPU vram can be reduced easily. 5GB does seem pretty low and I did run into some memory issues so if you have any tips on running with low memory, I'd be interested in hearing them.

lkwq007 / stablediffusion-infinity

[Bug] Docker Script does not detect GPU #83