Closed ShadowTime1290 closed 1 year ago
It is not currently possible here, as this image is based on ubuntu 20.04 rather than 16.04, for which the models are compiled. I don't know whether this matters, I am investigating it now.
GPU acceleration also will add 2GB to the image size as the cuda toolkit is massive
Edit:
indeed, models need to be rebuild: OSError: /lib/x86_64-linux-gnu/libcublas.so.9.0: version 'libcublas.so.9.0' not found (required by /app/obico/ml_api/bin/model_gpu_x86_64.so)
, its probably possible to install these packages from the bionic source, but will probably introduce more issues
+1 for GPU acceleration - would be really nice to get CUDA support.
Is there an option to use an external AI solution like CodeProject.AI or something that already has CUDA support?
this could possible be supported in the new obico release, once i get my head around all the pip packages conflicting
would you take a 3.5gb+ image for gpu acceleration over a 1gb image?
would you take a 3.5gb+ image for gpu acceleration over a 1gb image?
I would!
try give ghcr.io/imagegenius/igpipepr-obico:bfdccef9-pkg-8d778c6a-pr-12
a shot, the image is 7.69GB uncompressed, so yea make some room...
as stated in the obico docs, you'll see
...
obico-server-ml_api-1 | ----- Trying to load weights: /app/lib/../model/model-weights.xxxx - **use_gpu = True** -----
...
Succeeded!
...
if its using the GPU
Working! I see it find my GPU in the logs during startup.
I am running this using Unraid so I had to add "--runtime=nvidia" to "Extra Parameters" and I had to add two variables: "NVIDIA_VISIBLE_DEVICES" and "NVIDIA_DRIVER_CAPABILITIES".
Working! I see it find my GPU in the logs during startup.
I am running this using Unraid so I had to add "--runtime=nvidia" to "Extra Parameters" and I had to add two variables: "NVIDIA_VISIBLE_DEVICES" and "NVIDIA_DRIVER_CAPABILITIES".
You can just set —-gpus=all
in extra parameters, no need for setting runtime or those variables
Working! I see it find my GPU in the logs during startup.
I am running this using Unraid so I had to add "--runtime=nvidia" to "Extra Parameters" and I had to add two variables: "NVIDIA_VISIBLE_DEVICES" and "NVIDIA_DRIVER_CAPABILITIES".
no gripes?
You can just set
—-gpus=all
in extra parameters, no need for setting runtime or those variables
You are correct! That worked as well. Thanks!
no gripes?
Appears to be working! Thank you.
i couldent get it to work with just the CPU after installing the CUDA dependencies
Let's try darknet lib built with GPU support - /darknet/libdarknet_gpu.so
Done! Hooray! Now we have darknet with GPU support.
----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.darknet - use_gpu = True -----
Try to load cfg: /app/obico/ml_api/model/model.cfg, weights: /app/obico/ml_api/lib/../model/model-weights.darknet, clear = 0
CUDA status Error: file: ./src/dark_cuda.c: func: get_gpu_compute_capability() line: 619
CUDA Error: no CUDA-capable device is detected
/usr/bin/python3.7: get_gpu_compute_capability: Unknown error 368483375
[2023-06-29 22:48:28 +1000] [2031] [INFO] Booting worker with pid: 2031
Let's try darknet lib built with GPU support - /darknet/libdarknet_gpu.so
Done! Hooray! Now we have darknet with GPU support.
worker: Warm shutdown (MainProcess)
[2023-06-29 22:48:28 +1000] [1278] [INFO] Handling signal: term
[2023-06-29 12:48:28,725: INFO/MainProcess] beat: Shutting down...
----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.darknet - use_gpu = True -----
Try to load cfg: /app/obico/ml_api/model/model.cfg, weights: /app/obico/ml_api/lib/../model/model-weights.darknet, clear = 0
CUDA status Error: file: ./src/dark_cuda.c: func: get_gpu_compute_capability() line: 619
CUDA Error: no CUDA-capable device is detected
/usr/bin/python3.7: get_gpu_compute_capability: Unknown error 368483375
[2023-06-29 22:48:28 +1000] [1278] [INFO] Shutting down: Master
(kept repeating this non-stop)
so im seperating the branches into main
and cuda
- self explanatory names, once :cuda
is ready for testing, hopfully someone here will be willing...
Can I just throw the :cuda branch onto mine and pull?
yes, once i get it to build
Ok. Standing by.
ghcr.io/imagegenius/obico:cuda
- give it a shot, after you can revert back to ghcr.io/imagegenius/obico:bfdccef9-ig57
(same as what you're probably on now), :latest
does not include GPU dependencies anymore
Currently on ghcr.io/imagegenius/igpipepr-obico:bfdccef9-pkg-8d778c6a-pr-12
I have switched to ghcr.io/imagegenius/obico:cuda
Here is what I'm seeing in the logs - Obico does boot despite this.
2023-06-29 12:19:16.740658694 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:541 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
[2023-06-29 15:19:17,543: INFO/MainProcess] Connected to redis://192.168.2.6:6380//
[2023-06-29 15:19:17,551: INFO/MainProcess] mingle: searching for neighbors
[2023-06-29 15:19:17,628: INFO/Beat] beat: Starting...
Let's try darknet lib built with GPU support - /darknet/libdarknet_gpu.so
Nope! Failed to load darknet lib built with GPU support. erors=libcudnn.so.8: cannot open shared object file: No such file or directory
Now let's try darknet lib on CPU - /darknet/libdarknet_cpu.so
Error during importing YoloNet! - /darknet/libdarknet_cpu.so: cannot open shared object file: No such file or directory
----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.darknet - use_gpu = True -----
Failed! - Not loading darknet net due to previous import failure. Check earlier log for errors.
----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.onnx - use_gpu = True -----
Succeeded!
Currently on ghcr.io/imagegenius/igpipepr-obico:bfdccef9-pkg-8d778c6a-pr-12
I have switched to ghcr.io/imagegenius/obico:cuda
Here is what I'm seeing in the logs - Obico does boot despite this.
2023-06-29 12:19:16.740658694 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:541 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. [2023-06-29 15:19:17,543: INFO/MainProcess] Connected to redis://192.168.2.6:6380// [2023-06-29 15:19:17,551: INFO/MainProcess] mingle: searching for neighbors [2023-06-29 15:19:17,628: INFO/Beat] beat: Starting... Let's try darknet lib built with GPU support - /darknet/libdarknet_gpu.so Nope! Failed to load darknet lib built with GPU support. erors=libcudnn.so.8: cannot open shared object file: No such file or directory Now let's try darknet lib on CPU - /darknet/libdarknet_cpu.so Error during importing YoloNet! - /darknet/libdarknet_cpu.so: cannot open shared object file: No such file or directory ----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.darknet - use_gpu = True ----- Failed! - Not loading darknet net due to previous import failure. Check earlier log for errors. ----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.onnx - use_gpu = True ----- Succeeded!
try repulling the latest :cuda
I dont get these errors, however my printer is not connected, so darknet is probably not initialising w/o the printer connected
[2023-06-30 09:23:20 +1000] [775] [INFO] Starting gunicorn 19.9.0
[2023-06-30 09:23:20 +1000] [775] [INFO] Listening at: http://0.0.0.0:3333 (775)
[2023-06-30 09:23:20 +1000] [775] [INFO] Using worker: sync
[2023-06-30 09:23:20 +1000] [808] [INFO] Booting worker with pid: 808
django.db.backends DEBUG (0.002)
SELECT name, type FROM sqlite_master
WHERE type in ('table', 'view') AND NOT name='sqlite_sequence'
ORDER BY name; args=None
django.db.backends DEBUG (0.000) SELECT "django_migrations"."app", "django_migrations"."name" FROM "django_migrations"; args=()
2023-06-30 09:23:21.405498760 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:541 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
[2023-06-29 23:23:21,620: INFO/MainProcess] Connected to redis://localhost:6379//
[2023-06-29 23:23:21,623: INFO/MainProcess] mingle: searching for neighbors
[2023-06-29 23:23:21,639: INFO/Beat] beat: Starting...
[2023-06-29 23:23:22,630: INFO/MainProcess] mingle: all alone
[2023-06-29 23:23:22,635: WARNING/MainProcess] /usr/lib/python3.7/site-packages/celery/fixups/django.py:206: UserWarning: Using settings.DEBUG leads to a memory
leak, never use this setting in production environments!
leak, never use this setting in production environments!''')
[2023-06-29 23:23:22,635: INFO/MainProcess] celery@701544e29c04 ready.
Pulled latest. Getting:
2023-06-30 08:35:44.908443042 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:541 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
[2023-06-30 11:35:45,754: INFO/Beat] beat: Starting...
[2023-06-30 11:35:45,921: INFO/MainProcess] Connected to redis://192.168.2.6:6380//
[2023-06-30 11:35:45,929: INFO/MainProcess] mingle: searching for neighbors
Let's try darknet lib built with GPU support - /darknet/libdarknet_gpu.so
Nope! Failed to load darknet lib built with GPU support. erors=libcudnn.so.8: cannot open shared object file: No such file or directory
Now let's try darknet lib on CPU - /darknet/libdarknet_cpu.so
Done! Darknet is now running on CPU.
----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.darknet - use_gpu = True -----
Failed! - I respectfully decline to load the net as I am asked to use GPU but the loaded darknet module does NOT have GPU support
----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.onnx - use_gpu = True -----
Succeeded!
Pulled latest. Getting:
2023-06-30 08:35:44.908443042 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:541 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. [2023-06-30 11:35:45,754: INFO/Beat] beat: Starting... [2023-06-30 11:35:45,921: INFO/MainProcess] Connected to redis://192.168.2.6:6380// [2023-06-30 11:35:45,929: INFO/MainProcess] mingle: searching for neighbors Let's try darknet lib built with GPU support - /darknet/libdarknet_gpu.so Nope! Failed to load darknet lib built with GPU support. erors=libcudnn.so.8: cannot open shared object file: No such file or directory Now let's try darknet lib on CPU - /darknet/libdarknet_cpu.so Done! Darknet is now running on CPU. ----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.darknet - use_gpu = True ----- Failed! - I respectfully decline to load the net as I am asked to use GPU but the loaded darknet module does NOT have GPU support ----- Trying to load weights: /app/obico/ml_api/lib/../model/model-weights.onnx - use_gpu = True ----- Succeeded!
apologies for the delay, this should have been fixed a few weeks ago https://github.com/imagegenius/docker-obico/commit/d4f608b33a95b8e6236834e0b85826ef95b0296b
Updated! Now getting the following:
CUDA Error: forward compatibility was attempted on non supported HW
[2023-07-24 11:31:31 -0300] [3179] [INFO] Booting worker with pid: 3179
/usr/bin/python3.7: get_gpu_compute_capability: Unknown error -1645007825
Try to load cfg: /app/obico/ml_api/model/model.cfg, weights: /app/obico/ml_api/lib/../model/model-weights.darknet, clear = 0
CUDA status Error: file: ./src/dark_cuda.c: func: get_gpu_compute_capability() line: 619
Updated! Now getting the following:
CUDA Error: forward compatibility was attempted on non supported HW [2023-07-24 11:31:31 -0300] [3179] [INFO] Booting worker with pid: 3179 /usr/bin/python3.7: get_gpu_compute_capability: Unknown error -1645007825 Try to load cfg: /app/obico/ml_api/model/model.cfg, weights: /app/obico/ml_api/lib/../model/model-weights.darknet, clear = 0 CUDA status Error: file: ./src/dark_cuda.c: func: get_gpu_compute_capability() line: 619
🙄 please standby
were you using the :cuda branch? works for me...
root@Discovery:~# nvidia-smi
Sun Jul 30 14:07:10 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1660 ... Off | 00000000:01:00.0 Off | N/A |
| 0% 49C P8 18W / 125W | 827MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 4002 C /usr/bin/python3.7 824MiB |
+---------------------------------------------------------------------------------------+
23 [2023-07-30 04:06:13,604: INFO/MainProcess] mingle: all alone
[2023-07-30 04:06:13,609: WARNING/MainProcess] /usr/lib/python3.7/site-packages/celery/fixups/django.py:206: UserWarning: Using settings.DEBUG leads to a memory
leak, never use this setting in production environments!
leak, never use this setting in production environments!''')
[2023-07-30 04:06:13,609: INFO/MainProcess] celery@a999403393ec ready.
conv 1024 3 x 3/ 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BF
24 conv 1024 3 x 3/ 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BF
25 route 16 -> 26 x 26 x 512
26 conv 64 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 64 0.044 BF
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24 -> 13 x 13 x1280
29 conv 1024 3 x 3/ 1 13 x 13 x1280 -> 13 x 13 x1024 3.987 BF
30 conv 30 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 30 0.010 BF
31 detection
mask_scale: Using default '1.000000'
Total BFLOPS 29.338
avg_outputs = 607364
Allocate additional workspace_size = 131.08 MB
Try to load cfg: /app/obico/ml_api/model/model.cfg, weights: /app/obico/ml_api/lib/../model/model-weights.darknet, clear = 0
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
Create CUDA-stream - 0
Create cudnn-handle 0
Try to load weights: /app/obico/ml_api/lib/../model/model-weights.darknet
Loading weights from /app/obico/ml_api/lib/../model/model-weights.darknet...Done! Loaded 32 layers from weights-file
using this compose for reference:
version: "3"
services:
obico:
image: ghcr.io/imagegenius/obico:cuda
container_name: obico
env_file: stack.env
volumes:
- /mnt/user/appdata/obico:/config
networks:
br0.2:
ipv4_address: 192.168.2.3
ports:
- 3334:3334
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
# unraid labels
labels:
- net.unraid.docker.webui=http://[IP]:[PORT:3334]
- net.unraid.docker.icon=https://raw.githubusercontent.com/imagegenius/templates/main/unraid/img/obico.png
networks:
br0.2:
external: true
Yup.
Pulled the latest :cuda. Same issue.
Yup.
Pulled the latest :cuda. Same issue.
Does nvidia-smi show python 3.7?
Python 3.8.
@autumnwalker is there a newer version of the driver available for your gpu? 525.89.02
is old
CUDA Error: forward compatibility was attempted on non supported HW
is sticking out to me
Ahh! That appears to have done the trick. Thank you, and apologies for the wild goose chase.
What would be the best way to enable GPU acceleration, if at all possible?
It looks to be supported through native installation.
https://www.obico.io/docs/server-guides/advanced/nvidia-gpu/