naisy / realtime_object_detection

Plug and Play Real-Time Object Detection App with Tensorflow and OpenCV. No Bugs No Worries. Enjoy!
MIT License
101 stars 36 forks source link

Query regarding Performance of ssd_mobilenet in Xavier #73

Open Niran89 opened 4 years ago

Niran89 commented 4 years ago

Dear naisy,

Thank you for your work. I am currently working in porting ssd mobilenet Object Detection algorithm in Xavier platform.

The current configuration used in Xavier,

Jetpack - 4.1 Tensorflow - 1.12.0 with gpu support TensorRT - 5.0.3 with cuda 10

I have set up your project in Xavier and I am able to run it successfully. I have made the below observations w.r.t the performance by running a video stream (1280*720) as input. PFA.

performance_xavier

Note : During all these experiments, the Xavier is set to Max-N mode.

Can you help me to get a clarity on the following queries,

  1. Is my observation on the performance correct? Is this the expected performance in Xavier ? I have refered the "Current Max Performance of ssd_mobilenet_v1_coco_2018_01_28" table given in the README.md From the table I understand that in "Xavier with Max-N mode with visualiztion, for input of 1280x720, the FPS should be 48fps". But I have got only 35fps (nms_v2,ssd_mobilenet_v1_coco_2018_01_28). I am not sure why there is almost 10+fps drop when compared to the expected one. I havent made any changes in the code.

  2. As per my understanding, TRT model is supposed to perform better than the normal TF model. But from my above observation, I see a drop in fps by using TRT (comparison: trt_v1 vs nms_v2)

  3. From the overall understanding, nms_v2 is performing better than nms_v1 and trt_v1. What is the major difference between nms_v1, nms_v2, trt_v1

Appreciate your help.

Regards, Niran

naisy commented 4 years ago

Hi @Niran89,

  1. About FPS Please try jetson_clocks.sh. This is probably the reason why 10FPS is also slow.

MAX-N mode is the setting of the maximum number of cores and clock(Hz) of CPU and GPU. This setting will keep after rebooting.

jetson_clocks.sh sets the CPU and GPU to the maximum clock. This setting will not keep after reboot. The default kernel will boot in low clock mode each time.

  1. About TF-TRT model Probably because my code is old. The old TF-TRT was slow due to overhead. The recent TF-TRT seems to be fast. I would like to see it when I have time.

  2. About nms_v1, nms_v2, trt_v1 ssd_mobilenet_v1_coco_2017_11_17(nms_v1) and ssd_mobilenet_v1_coco_2018_01_28(nms_v2) differ in the non-maximum suppression part. This node targeted by split model. Similarly, trt_v1 is also a different node.

split model of nms_v1 targets Postprocessor/convert_scores and Postprocessor/ExpandDims_1. nms_v2 targets Postprocessor/Slice, Postprocessor/ExpandDims_1 and Postprocessor/stack_1. trt_v1 targets Postprocessor/Slice and Postprocessor/ExpandDims_1.

These models have the same ssd mobilenet v1 part, but non-maximum suppression part is different.

About split model: https://github.com/naisy/realtime_object_detection/blob/master/About_Split-Model.md

Niran89 commented 4 years ago

Hi Naisy,

Thank you for the reply and sorry for the delayed response from my side.

FYI, I have already set the clocks in Xavier using "sudo ./jetson_clocks.sh". But even after setting the clock and running the Xavier in Max-N mode, I still see 10 fps drop while running "ssd_mobilenet_v1_coco_2018_01_28" model with the car image (544x288 resolution) you have used.

From your github, I understand that you have achieved almost 52fps in Xavier with the car image. But I could attain only 40fps. Can you let me know what I am missing more. Below is the configuration I have used before running the model,

In config.yml: model_type: 'nms_v2' model_path: 'models/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb' force_gpu_compatible: False/True (Tried with both) visualize: True width: 544 height: 288 split_model: True split_shape: 1917

Other configurations are still same as yours. Once these configurations are set, I ran python run_image.py , after which I observed 39 fps in Xavier.

My current Setup in Xavier,

Jetpack - 4.1 Python - 2.7 Opencv - 3.4.2 Tensorflow - 1.12.0

Appreciate your help.

Thanks & Regards, Niran

naisy commented 4 years ago

Hi @Niran89,

run_image.py and run_video.py are slower than run_stream.py. The reason is that the method of frame reading processing is different. https://github.com/naisy/realtime_object_detection/issues/52#issuecomment-419067660

Can you try with usb webcam?

Niran89 commented 4 years ago

Hi @naisy ,

Thank you once again...

As suggested, I ran the algorithm with a stream from a USB camera. But still I could see 10fps drop. The configuration I used is,

Hardware : Xavier code : run_stream.py width : 1280 height : 720 visualize : True force_compatible_gpu : False Max mode : Max-N mode and clock is also set

From your performance table, for the above configuration, the FPS is supposed to be 48 in Xavier. But i got only around 39fps. Is there anything I am missing out still..

Note : I tried with multiple objects in the scene. For any number of objects and even with no objects in the scene, the fps is still 39.

Appreciate your help...

Thanks & Regards, Niran

naisy commented 4 years ago

Hi @Niran89,

Can you show tegrastats?

Niran89 commented 4 years ago

Hi @naisy ,

PFA for the tegrastats recorded while running the object detection code in Xavier. The configuration is same as above. Also you could see the FPS recorded in the same image.

Thanks & Regards, Niran tegrastats_nvidia

naisy commented 4 years ago

Hi @Niran89,

Looking at the tegrastats results, it seems that both the GPU and the CPU are doing enough work. But 36.4fps is too slow.

If you are running over the network, it will be slower. For example, ssh -C -Y ubuntu@xavier_ip_address Other than that, I don't know why it is so slow, sorry.

Niran89 commented 4 years ago

Hi @naisy ,

Thank you for the reply.

I am sure that the code is not run over the network as you mentioned above. I am directly running it as mentioned in your README file.

Anyways thanks for your support so far. Kindly let me know if you get any pointers regarding the performance drop in future. Looking forward.

Thanks & Regards, Niran