naisy / realtime_object_detection

Plug and Play Real-Time Object Detection App with Tensorflow and OpenCV. No Bugs No Worries. Enjoy!
MIT License
101 stars 36 forks source link

What happen if we split the model in other nodes? #49

Closed nvnnghia closed 5 years ago

nvnnghia commented 5 years ago

Why don't you split the model in the middle of the graph? Like, If we have 20 convolution layers, we will break it into 2 parts. What will happen if we do like that?

naisy commented 5 years ago

Hi @nvnnghia,

This place is because easy to split and the processing time of GPU and CPU is the same. Of course, any node can be split.

I tried to split Mask R-CNN at 'Gather' node because it is easy to split, but the processing time got worse.

nvnnghia commented 5 years ago

Thanks for your response. I still wonder why the speed is improved a lot. If we just split the model into 2 parts, ideally the speed can be 2 times faster, but in fact, it much faster than 2 times. Can you explain this?

naisy commented 5 years ago

Hi, @nvnnghia,

First, although it is CPU part, it is slow on GPU. Because the execution speed of tf.where depends on Hz, so it is faster to run it on CPU with high Hz. This alone improves to 9 FPS -> 19 FPS on TX2. (ssd_mobilenet_v1_coco_2018_01_28, 640x480 image size without visualization)

Next, I separate the execution of the model into gpu thread and cpu thread. And other python processing (drawing etc.) into main thread. As a result, TX2 improves to 31.2 FPS.

As a result of these tunings, speed improvement is about 3 times.

nvnnghia commented 5 years ago

Thank you very much for the detail explanation.