Open CRCinAU opened 2 years ago
I've been testing more and more - and with this docker-compose.yaml file, tensorflow detects the GPU ok:
services:
test:
image: tensorflow/tensorflow:latest-gpu
command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
Output:
Creating tensor_test_1 ... done
Attaching to tensor_test_1
test_1 | 2021-12-15 12:53:15.611667: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
test_1 | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
test_1 | 2021-12-15 12:53:15.632027: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:15.651586: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:15.651933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:16.198873: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:16.199203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:16.199453: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:16.201678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 1672 MB memory: -> device: 0, name: NVIDIA GeForce GT 710, pci bus id: 0000:01:00.0, compute capability: 3.5
tensor_test_1 exited with code 0
I think I managed to get this working!
Firstly, this is on Ubuntu 20.04 - and we need to install python 3.7 - and do this all as the root user:
# add-apt-repository ppa:deadsnakes/ppa
# apt-get update
# apt-get install python3.7
Create a python 3.7 venv and activate it:
# python3.7 -m venv /root/python-3.7
# cd /root/python-3.7
# source bin/activate
Now we want to install a proper version of torch that includes the stuff we need. I chose the same version of torch that's used in the deepstack install:
# pip install torch==1.6.0+cu101 -f https://nelsonliu.me/files/pytorch/whl/torch_stable.html
Run deepstack and map in the alternative torch package:
# docker run --gpus all -e VISION-DETECTION=True -e VISION-FACE=True -v /root/python-3.7/lib/python3.7/site-packages/torch:/usr/local/lib/python3.7/dist-packages/torch -v localstorage:/datastore -p 5000:5000 deepquestai/deepstack:gpu
This will map in the alternative torch packages that in my case supports sm_35
Results:
[GIN] 2021/12/16 - 02:20:54 | 200 | 328.902025ms | 172.31.1.89 | POST "/v1/vision/detection"
[GIN] 2021/12/16 - 02:21:06 | 200 | 225.5783ms | 172.31.1.89 | POST "/v1/vision/detection"
[GIN] 2021/12/16 - 02:21:09 | 200 | 233.602927ms | 172.31.1.89 | POST "/v1/vision/detection"
Compared to running on a Jetson Nano 4Gb:
[GIN] 2021/12/16 - 02:14:04 | 200 | 278.531116ms | 172.31.1.89 | POST /v1/vision/detection
[GIN] 2021/12/16 - 02:14:06 | 200 | 292.32564ms | 172.31.1.89 | POST /v1/vision/detection
[GIN] 2021/12/16 - 02:14:07 | 200 | 270.695522ms | 172.31.1.89 | POST /v1/vision/detection
Output of nvidia-smi
from within the deepstack container:
# docker exec -ti admiring_germain nvidia-smi
Thu Dec 16 02:33:00 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 N/A | N/A |
| 33% 34C P8 N/A / N/A | 1178MiB / 2002MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I'm trying to get GPU processing working on my older 2Gb GeForce GT 710 - which I believe should be about as fast as a jetson nano...
When I try to run deepstack with GPU enabled, I get:
Is there a way to enable sm_35 support in pytorch used in these containers? I can't quite see where it gets set....
Here's the
nvidia-smi
output from within the container:Supported versions seem to be: