naisy / realtime_object_detection

Plug and Play Real-Time Object Detection App with Tensorflow and OpenCV. No Bugs No Worries. Enjoy!
MIT License
101 stars 36 forks source link

Tensorflow realtime_object_detection on Jetson Xavier/TX2/TX1, PC

About this repository

forked from GustavZ/realtime_object_detection: https://github.com/GustavZ/realtime_object_detection
And focused on model split technique of ssd_mobilenet_v1.

Download model from here: tf1_detection_model_zoo

wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz

and here: TensorFlow DeepLab Model Zoo

wget http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz

Support models

Model model_type split_shape
ssd_mobilenet_v1_coco_11_06_2017 nms_v0 1917
ssd_mobilenet_v1_coco_2017_11_17 nms_v1 1917
ssd_inception_v2_coco_2017_11_17 nms_v1 1917
ssd_mobilenet_v1_coco_2018_01_28 nms_v2 1917
ssdlite_mobilenet_v2_coco_2018_05_09 nms_v2 1917
ssd_inception_v2_coco_2018_01_28 nms_v2 1917
ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_03 nms_v2 1917
ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03 nms_v2 1917
ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 nms_v2 51150
ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 nms_v2 51150
ssd_mobilenet_v1_ppn_shared_box_predictor_300x300_coco14_sync_2018_07_03 nms_v2 3000
faster_rcnn_inception_v2_coco_2018_01_28 faster_v2
faster_rcnn_resnet50_coco_2018_01_28 faster_v2
faster_rcnn_resnet101_coco_2018_01_28 faster_v2
faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 faster_v2
mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 mask_v1
mask_rcnn_inception_v2_coco_2018_01_28 mask_v1
mask_rcnn_resnet101_atrous_coco_2018_01_28 mask_v1
mask_rcnn_resnet50_atrous_coco_2018_01_28 mask_v1
deeplabv3_mnv2_pascal_train_aug_2018_01_29 deeplab_v3
deeplabv3_mnv2_pascal_trainval_2018_01_29 deeplab_v3
deeplabv3_pascal_train_aug_2018_01_04 deeplab_v3
deeplabv3_pascal_trainval_2018_01_04 deeplab_v3

See also:

Getting Started:

Requirements:

pip install --upgrade pyyaml

Also, OpenCV >= 3.1 and Tensorflow >= 1.4 (1.6 is good)

config.yml

Image

with run_image.py
Please create 'images' directory and put image files.(jpeg,jpg,png)
Subdirectories can also be used.

image_input: 'images'       # input image dir

Movie

with run_video.py

movie_input: 'input.mp4'    # mp4 or avi. Movie file.

Camera

with run_stream.py
This is OpenCV argument.

Save to file

Without Visualization

I do not know why, but in TX2 force_gpu_compatible: True it will be faster.

With Visualization

Visualization is heavy. Visualization FPS possible to limit.
Display FPS: Detection FPS.

# ssd_mobilenet_v1_coco_2018_01_28
model_type: 'nms_v2'
model_path: 'models/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb'
label_path: 'models/labels/mscoco_label_map.pbtxt'
num_classes: 90
learned size split_shape
300x300 1917
400x400 3309
500x500 5118
600x600 7326

See also: Learn Split Model

Console Log

FPS:25.8  Frames:130 Seconds: 5.04248   | 1FRAME total: 0.11910   cap: 0.00013   gpu: 0.03837   cpu: 0.02768   lost: 0.05293   send: 0.03834   | VFPS:25.4  VFrames:128 VDrops: 1 

FPS: detection fps. average fps of fps_interval (5sec).
Frames: detection frames in fps_interval.
Seconds: fps_interval running time.


1FRAME
total: 1 frame's processing time. 0.1 means delay and 10 fps if it is single-threading(split_model: False). In multi-threading(split_model: True), this value means delay.
cap: time of capture camera image and transform for model input.
gpu: sess.run() time of gpu part.
cpu: sess.run() time of cpu part.
lost: time of overhead, something sleep etc.
send: time of multi-processing queue, block and pipe time.


VFPS: visualization fps.
VFrames: visualization frames in fps_interval.
VDrops: When multi-processing visualization is bottleneck, drops.

Updates:

My Setup:

NVPMODEL

Mode Mode Name Denver 2 Frequency ARM A57 Frequency GPU Frequency
0 Max-N 2 2.0 GHz 4 2.0 GHz 1.30 GHz
1 Max-Q 0 4 1.2 GHz 0.85 GHz
2 Max-P Core-All 2 1.4 GHz 4 1.4 GHz 1.12 GHz
3 Max-P ARM 0 4 2.0 GHz 1.12 GHz
4 Max-P Denver 2 2.0 GHz 0 1.12 GHz

Max-N

sudo nvpmodel -m 0
sudo ./jetson_clocks.sh

Max-P ARM(Default)

sudo nvpmodel -m 3
sudo ./jetson_clocks.sh

Show current mode

sudo nvpmodel -q --verbose

Current Max Performance of ssd_mobilenet_v1_coco_2018_01_28

FPS Machine Size Split Model Visualize Mode CPU Watt Ampere Volt-Ampere Model classes
227 PC 160x120 True False - 27-33% 182W 1.82A 183VA frozen_inference_graph.pb 90
223 PC 160x120 True True, Worker 30 FPS Limit - 28-36% 178W 1.77A 180VA frozen_inference_graph.pb 90
213 PC 544x288 True False - 49-52% 178W 1.79A 180VA frozen_inference_graph.pb 90
212 PC 160x120 True True - 30-34% 179W 1.82A 183VA frozen_inference_graph.pb 90
207 PC 544x288 True True, Worker 30 FPS Limit - 48-53% 178W 1.76A 178VA frozen_inference_graph.pb 90
190 PC 544x288 True True - 52-58% 176W 1.80A 177VA frozen_inference_graph.pb 90
174 PC 1280x720 True False - 42-49% 172W 1.72A 174VA frozen_inference_graph.pb 90
163 PC 1280x720 True True, Worker 30 FPS Limit - 47-53% 170W 1.69A 170VA frozen_inference_graph.pb 90
153 PC 1280x720 True True, Worker 60 FPS Limit - 51-56% 174W 1.73A 173VA frozen_inference_graph.pb 90
146 PC 1280x720 True True, Worker No Limit (VFPS:67) - 57-61% 173W 1.70A 174VA frozen_inference_graph.pb 90
77 PC 1280x720 True True - 29-35% 142W 1.43A 144VA frozen_inference_graph.pb 90
60 Xavier 160x120 True False Max-N 34-42% 31.7W 0.53A 54.5VA frozen_inference_graph.pb 90
59 Xavier 544x288 True False Max-N 39-45% 31.8W 0.53A 54.4VA frozen_inference_graph.pb 90
58 Xavier 1280x720 True False Max-N 38-48% 31.6W 0.53A 55.1VA frozen_inference_graph.pb 90
54 Xavier 160x120 True True Max-N 39-44% 31.4W 0.52A 54.4VA frozen_inference_graph.pb 90
52 Xavier 544x288 True True Max-N 39-50% 31.4W 0.55A 56.0VA frozen_inference_graph.pb 90
48 Xavier 1280x720 True True Max-N 44-76% 32.5W 0.54A 55.6VA frozen_inference_graph.pb 90
43 TX2 160x120 True False Max-N 65-76% 18.6W 0.28A 29.9VA frozen_inference_graph.pb 90
40 TX2 544x288 True False Max-N 60-77% 18.0W 0.28A 29.8VA frozen_inference_graph.pb 90
38 TX2 1280x720 True False Max-N 62-75% 17.7W 0.27A 29.2VA frozen_inference_graph.pb 90
37 TX2 160x120 True True Max-N 5-68% 17.7W 0.27A 28.0VA frozen_inference_graph.pb 90
37 TX2 160x120 True False Max-P ARM 80-86% 13.8W 0.22A 23.0VA frozen_inference_graph.pb 90
37 TX2 160x120 True True Max-P ARM 77-80% 14.0W 0.22A 23.1VA frozen_inference_graph.pb 90
35 TX2 544x288 True True Max-N 20-71% 17.0W 0.27A 27.7VA frozen_inference_graph.pb 90
35 TX2 544x288 True False Max-P ARM 82-86% 13.6W 0.22A 22.8VA frozen_inference_graph.pb 90
34 TX2 1280x720 True False Max-P ARM 82-87% 13.6W 0.21A 22.2VA frozen_inference_graph.pb 90
32 TX2 544x288 True True Max-P ARM 79-85% 13.4W 0.21A 22.3VA frozen_inference_graph.pb 90
31 TX2 1280x720 True True Max-N 46-75% 16.9W 0.26A 28.1VA frozen_inference_graph.pb 90
27 TX1 160x120 True False - 71-80% 17.3W 0.27A 28.2VA frozen_inference_graph.pb 90
26 TX2 1280x720 True True Max-P ARM 78-86% 12.6W 0.20A 21.2VA frozen_inference_graph.pb 90
26 TX1 544x288 True False - 74-82% 17.2W 0.27A 29.0VA frozen_inference_graph.pb 90
26 TX1 160x120 True True - 69-81% 17.1W 0.27A 28.7VA frozen_inference_graph.pb 90
24 TX1 1280x720 True False - 73-80% 17.6W 0.27A 29.3VA6 frozen_inference_graph.pb 90
23 TX1 544x288 True True - 77-82% 16.7W 0.27A 28.2VA frozen_inference_graph.pb 90
19 TX1 1280x720 True True - 78-86% 15.8W 0.26A 26.7VA frozen_inference_graph.pb 90

on Xavier 544x288:

on PC 544x288:

on TX2 544x288:

Youtube

Robot Car and Realtime Object Detection

TX2

Object Detection vs Semantic Segmentation on TX2

TX2

Realtime Object Detection on TX2

TX2

Realtime Object Detection on TX1

TX1

Movie's FPS is little bit slow down. Because run ssd_movilenet_v1 with desktop capture.
Capture command:

gst-launch-1.0 -v ximagesrc use-damage=0 ! nvvidconv ! 'video/x-raw(memory:NVMM),alignment=(string)au,format=(string)I420,framerate=(fraction)25/1,pixel-aspect-ratio=(fraction)1/1' ! omxh264enc !  'video/x-h264,stream-format=(string)byte-stream' ! h264parse ! avimux ! filesink location=capture.avi

Training ssd_mobilenet with own data

https://github.com/naisy/train_ssd_mobilenet

Multi-Threading for Realtime Object Detection

Multi-Threading for Realtime Object Detection

Learn Split Model

Learn Split Model