johnolafenwa / DeepStack

The World's Leading Cross Platform AI Engine for Edge Devices
Apache License 2.0
672 stars 106 forks source link

Deepstack gpu docker image timeout #99

Open rickydua opened 3 years ago

rickydua commented 3 years ago

System: Debian Buster 10 Backports buster linux drivers GTX 960 GPU

Hey there, I'm using the latest image from docker hub deepquestai/deepstack:gpu. After following guide here, I manged to launch deepstack:gpu container but everytime I send an image for detection I get timeout error.

{'success': False, 'error': 'failed to process request before timeout', 'duration': 0}

Steps I took:

More info:

sudo nvidia-docker run --name=deepstack --gpus all -e MODE=High -e VISION-DETECTION=True -v deepstack:/datastore -p 5000:5000 deepquestai/deepstack:gpu
DeepStack: Version 2021.02.1
/v1/vision/detection
---------------------------------------
---------------------------------------
v1/backup
---------------------------------------
v1/restore
[GIN] 2021/04/02 - 22:05:09 | 500 |          1m0s |      172.17.0.1 | POST     /v1/vision/detection

Host Nvidia SMI

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960     On   | 00000000:07:00.0 Off |                  N/A |
|  7%   45C    P8    14W / 130W |      1MiB /  2000MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

root@21951afe7542:/app/server# cat logs/stderr.txt exit status 1chdir intelligencelayer\shared: The system cannot find the path specified

root@21951afe7542:/app/server# cat ../logs/stderr.txt 
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/app/intelligencelayer/shared/detection.py", line 69, in objectdetection
    detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
  File "/app/intelligencelayer/shared/./process.py", line 36, in __init__
    self.model = attempt_load(model_path, map_location=self.device)
  File "/app/intelligencelayer/shared/./models/experimental.py", line 159, in attempt_load
    torch.load(w, map_location=map_location)["model"].float().fuse().eval()
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 584, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 842, in _load
    result = unpickler.load()
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 834, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 823, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 803, in restore_location
    return default_restore_location(storage, str(map_location))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 174, in default_restore_location
    result = fn(storage, location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 150, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 134, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Kosh42 commented 3 years ago

Same. Shame as it has been solid for months.

gillonba commented 3 years ago

I wonder when there is going to be an update? It has been two months since a checkin of this project. It is a great project and it would be a shame to see it abandoned!

johnolafenwa commented 3 years ago

Hello @rickx34 @Kosh42 @gillonba

Thanks for reporting this. Sorry we have not been able to attend to issues for a while now. We have an update to DeepStack coming this month.

On this issue, it appears DeepStack is unable to detect the gpu. Also, i notice from the results of nvidia-smi above that cuda version is N/A (CUDA Version: N/A )

Did you attempt to install cuda and what version of cuda was installed?

rickydua commented 3 years ago

@johnolafenwa I think the output of nvidia-smi is from host, I presume the docker image has cuda installed, I can nvidia-smi within the docker image and can get cuda version

chorus12 commented 2 years ago

Folks, its a shame but we have to update the docs for the docker on linux... When you run a docker with GPU on linux you have to pass --privileged parameter so the container can access NVIDIA devices on the host. You can also mess with --device param but the quickest way would be just --privileged. docker run --gpus all --privileged ...<rest of the parameters>

bbrendon commented 2 years ago

I'm having this exact problem and same error on debian 11 but haven't been able to get past it. I tried --privileged as well.

william-bohannan commented 2 years ago

Have CPU working for all three VISION-SCENE, VISION-DETECTION, VISION-FACE. Really nice work!

Now with GPU option only VISION-SCENE, VISION-DETECTION are working. The VISION-FACE is timing out:

[GIN] 2022/03/18 - 22:53:21 | 500 | 1m0s | 172.17.0.1 | POST "/v1/vision/face/"

docker run --gpus all --privileged -e VISION-FACE=True -v /mnt/user/security/datastore:/datastore -p 5000:5000 deepquestai/deepstack:gpu-2022.01.1

Also tried deepquestai/deepstack:gpu-x5-beta with the same result.

Running Intel Core i5-6500 and GeForce GTX 1050 on Ubuntu 20.04 LTS, downloaded today, fresh install.

Cuda working inside docker as below is test / output:

docker run --gpus all nvidia/cuda:11.0-base nvidia-smi NVIDIA-SMI 510.54 Driver Version: 510.54 CUDA Version: 11.6

william-bohannan commented 2 years ago

Simple install code to make it quick to replicate, also includes python test scripts

install-notes.txt python.zip .

Also installed Nvidia cudnn8 with same timeout happening with DeepStack GPU Face, below steps taken..

OS="ubuntu2004" sudo apt-get update get https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin sudo mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/7fa2af80.pub sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /" apt search libcudnn apt-get install libcudnn8 libcudnn8-dev

william-bohannan commented 2 years ago

must have been the memory of the graphics card, all working on the 1080ti which has 11GB of memory

mikegleasonjr commented 2 years ago

Hi,

gpu-2022.01.1 does not work for me on any endpoints. I get a timeout after 1m.

gpu-2021.09.1 works for me on every endpoints even without --privileged.

I have a GeForce GTX 1650 4G.

Here's my nvidia-smi on gpu-2022.01.1:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0 Off |                  N/A |
| 36%   40C    P8     7W /  75W |      0MiB /  3909MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Here's my nvidia-smi on gpu-2021.09.1:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0 Off |                  N/A |
| 37%   43C    P0    21W /  75W |   3072MiB /  3909MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

EDIT: notice the memory is 0MiB / 3909MiB on the gpu-2022.01.1... so nothing has been loaded I guess.

Brando47 commented 2 years ago

Same issue for me with a GTX 750Ti, works perfectly with gpu-2021.09.1 but not with gpu-2022.01.1

ghzgod commented 1 year ago

Any update to this?

rocket357 commented 1 year ago

Just an FYI, I ran across the same issue after redis crashed on the docker system I was running on. Probably not the most common cause, but the timeouts happened the same and tailing /app/logs/stderr.txt in the container revealed the issue.

JPM-git commented 1 year ago

So 1y and half later and no news on something like gpu usage on a image recognizing software.

LeorFinacre commented 3 months ago

Any news on this subject ?

michaelyorkpa commented 3 months ago

Any news on this subject ?

The project has been dead for over two years. You can switch to CodeProject AI. https://www.codeproject.com/AI/docs/

LeorFinacre commented 2 months ago

Oh thank you for pointing me to the successor. Just a quick question, I see :

The Docker GPU version is specific to nVidia's CUDA enabled cards with compute capability >= 6.0

So my graphic card with compute compatibility 3.0 is useless with this project I guess ? Just to be sure if there is a way to use it anyway or not ?

michaelyorkpa commented 2 months ago

I use the windows version so I can't speak to specifics. But, I'd grab the Docker CPU version, then once it's installed there are multiple modules you can use that will operate on older GPUs. Within CPAI are multiple types of processing modules available to install and use for image processing (and a few for sound, facial recognition, text process like license plates, etc.).