obendidi / Tracking-with-darkflow

Real-time people Multitracker using YOLO v2 and deep_sort with tensorflow
GNU General Public License v3.0
524 stars 176 forks source link

Low FPS using GPU. #64

Open migvel opened 6 years ago

migvel commented 6 years ago

Hello,

I'm testing it with a Tesla K80, here some versions:

Looks that the GPU is recognized by tensorflow, typing sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) (bellow the output)

However I get around 0.5 FPS compared to the Darknet YOLO 10FPS in the same hardware with the same video, so looks that is not using the GPU.

Any idea on how I could debug it more to see what is keeping so low the FPS?

Thanks.

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 2018-05-25 17:39:35.421195: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-05-25 17:39:37.739344: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-05-25 17:39:37.739823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:1e.0 totalMemory: 11.17GiB freeMemory: 11.11GiB 2018-05-25 17:39:37.739852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7 2018-05-25 17:39:37.868862: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7

migvel commented 6 years ago

Just to point out that looking at nvidia-smi I can see clearly that the GPU is being used, so not sure why the performance is so low, specially comparing it with the results "some numbers" by the author.

PBRAOS commented 5 years ago

I had the same issues and reinstalled all my drivers and now FPS is 25-30/s. I think NVIDIA needs to embed on their drivers the CUDA and cuDNN packages. Too complicated for no reason really this versioning thing. pm me to tell you how i worked out my solution.

WARNING:tensorflow:From /home/.../PycharmProjects/Tracking-with-darkflow/deep_sort/generate_detections.py:176: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead 2018-10-12 14:11:12.904096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0 2018-10-12 14:11:12.904116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-12 14:11:12.904119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0 2018-10-12 14:11:12.904122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N 2018-10-12 14:11:12.904199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4869 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /home/.../PycharmProjects/Tracking-with-darkflow/deep_sort/generate_detections.py:320: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step Press [ESC] to quit video Press [ESC] to quit demo 28.717 FPSTraceback (most recent call last): File "/home/.../PycharmProjects/Tracking-with-darkflow/run.py", line 32, in tfnet.camera()