Open D34DC3N73R opened 1 year ago
that output from nvidia smi looks fine, in your case, you can just delete that check from that script in your case so it continues on to the app. I'll think about a better way to detect for other users in general. On my machine for comparison looks like this as below, so it was simply finding those gpus. I see you have a quadro don't see why it wouldn't work, although that amount of GPU Memory 3863MiB / 5057MiB looks kind of low, not sure what the minimum amount of vram is supported, so you might try closing all your open windows programs too ( like your webbrower ) to try to free up memory, if it seems like you're getting out of memory errors down the line.
` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.57 Driver Version: 516.59 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A | | 0% 53C P8 28W / 350W | 2284MiB / 24576MiB | 15% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A | | 0% 50C P8 7W / 151W | 0MiB / 8192MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ `
The difference is driver persistence mode is enabled. I'm not sure if it's required here or not, but it seemed to run fine without driver persistence mode enabled. I tried enabling persistence mode, but the check still fails with a quadro GPU. I changed the check to egrep -e 'NVIDIA-SMI'
and it ran fine.
In regards to memory, I was also running deepstack, compreface, and some other apps making use of the GPU when I ran nvidia-smi
. It's a headless server, so GPU vram can be reduced easily. 5GB does seem pretty low and I did run into some memory issues so if you have any tips on running with low memory, I'd be interested in hearing them.
Describe the bug The docker test conditions fail even when nvidia gpus are properly installed and available in docker.
I'm not sure exactly what this is searching for in nvidia-smi
egrep -e 'NVIDIA.*On'
To Reproduce Steps to reproduce the behavior:
Expected behavior I would expect any output from nvidia-smi would be sufficient.
Screenshots N/A
Desktop (please complete the following information):