johnolafenwa / DeepStack

The World's Leading Cross Platform AI Engine for Edge Devices
Apache License 2.0
675 stars 105 forks source link

Enable sm_35 support in pytorch #129

Open CRCinAU opened 2 years ago

CRCinAU commented 2 years ago

I'm trying to get GPU processing working on my older 2Gb GeForce GT 710 - which I believe should be about as fast as a jetson nano...

When I try to run deepstack with GPU enabled, I get:

/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py:125: UserWarning: 
NVIDIA GeForce GT 710 with CUDA capability sm_35 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the NVIDIA GeForce GT 710 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/app/intelligencelayer/shared/detection.py", line 69, in objectdetection
    detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
  File "/app/intelligencelayer/shared/./process.py", line 36, in __init__
    self.model = attempt_load(model_path, map_location=self.device)
  File "/app/intelligencelayer/shared/./models/experimental.py", line 159, in attempt_load
    torch.load(w, map_location=map_location)["model"].float().fuse().eval()
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 485, in float
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 376, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 485, in <lambda>
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device

Is there a way to enable sm_35 support in pytorch used in these containers? I can't quite see where it gets set....

Here's the nvidia-smi output from within the container:

root@7ea2aa82ebb1:/app/logs# nvidia-smi 
Wed Dec 15 09:44:56 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 N/A |                  N/A |
| 33%   37C    P0    N/A /  N/A |      0MiB /  2002MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Supported versions seem to be:

root@7ea2aa82ebb1:~# python3 -c "import torch; print(torch.cuda.get_arch_list())"
['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75']
CRCinAU commented 2 years ago

I've been testing more and more - and with this docker-compose.yaml file, tensorflow detects the GPU ok:

services:
  test:
    image: tensorflow/tensorflow:latest-gpu
    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

Output:

Creating tensor_test_1 ... done
Attaching to tensor_test_1
test_1  | 2021-12-15 12:53:15.611667: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
test_1  | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
test_1  | 2021-12-15 12:53:15.632027: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:15.651586: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:15.651933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:16.198873: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:16.199203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:16.199453: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2021-12-15 12:53:16.201678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 1672 MB memory:  -> device: 0, name: NVIDIA GeForce GT 710, pci bus id: 0000:01:00.0, compute capability: 3.5
tensor_test_1 exited with code 0
CRCinAU commented 2 years ago

I think I managed to get this working!

Firstly, this is on Ubuntu 20.04 - and we need to install python 3.7 - and do this all as the root user:

# add-apt-repository ppa:deadsnakes/ppa
# apt-get update
# apt-get install python3.7

Create a python 3.7 venv and activate it:

# python3.7 -m venv /root/python-3.7
# cd /root/python-3.7
# source bin/activate

Now we want to install a proper version of torch that includes the stuff we need. I chose the same version of torch that's used in the deepstack install:

# pip install torch==1.6.0+cu101 -f https://nelsonliu.me/files/pytorch/whl/torch_stable.html

Run deepstack and map in the alternative torch package:

# docker run --gpus all -e VISION-DETECTION=True -e VISION-FACE=True -v /root/python-3.7/lib/python3.7/site-packages/torch:/usr/local/lib/python3.7/dist-packages/torch -v localstorage:/datastore -p 5000:5000 deepquestai/deepstack:gpu

This will map in the alternative torch packages that in my case supports sm_35

Results:

[GIN] 2021/12/16 - 02:20:54 | 200 |  328.902025ms |     172.31.1.89 | POST     "/v1/vision/detection"
[GIN] 2021/12/16 - 02:21:06 | 200 |    225.5783ms |     172.31.1.89 | POST     "/v1/vision/detection"
[GIN] 2021/12/16 - 02:21:09 | 200 |  233.602927ms |     172.31.1.89 | POST     "/v1/vision/detection"

Compared to running on a Jetson Nano 4Gb:

[GIN] 2021/12/16 - 02:14:04 | 200 |  278.531116ms |     172.31.1.89 | POST     /v1/vision/detection
[GIN] 2021/12/16 - 02:14:06 | 200 |   292.32564ms |     172.31.1.89 | POST     /v1/vision/detection
[GIN] 2021/12/16 - 02:14:07 | 200 |  270.695522ms |     172.31.1.89 | POST     /v1/vision/detection

Output of nvidia-smi from within the deepstack container:

# docker exec -ti admiring_germain nvidia-smi
Thu Dec 16 02:33:00 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 N/A |                  N/A |
| 33%   34C    P8    N/A /  N/A |   1178MiB /  2002MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+