tue-mps-edu / asd-engd-project-2019-thermal-object-detection

This is the repository for the ASD PDEng module 2 project (2020) for AIIM
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Slow Inference time using Thermal Camera #38

Closed Hrayo712 closed 4 years ago

Hrayo712 commented 4 years ago

Our current system is able to fetch images from the camera, and run inference on them. However, inference time performance is below the expected. Current results show an average of 25ms inference time, which is below the requirements specification (10ms).

One of the bottlenecks which might lead to this performance degradation is the simple way in which the trt_ssd.py script runs inference, that is, in a sequential manner (i.e. no threading/parallelism).

Another possible source of slow inference time can be the "mechanism" being used to fetch data from the camera. Currently, the code relies on OpenCV's inner mechanisms to deal with the camera device interfacing (cv2.VideoCapture(0)). In other words, no specific pipeline parameters are provided to the function. It is left to OpenCV to figure out how to open the device. Some investigation has already been made on this matter, and tested on an RGB USB camera (without threading), and performance degradation can be seen when the GStream pipeline is not properly specified.

A third source of performance degradation is the usage of python code, instead of NVIDIA's C++ API. The performance penalty imposed by python's interpreter can lead to a degree of slow down. Nevertheless, it is still unknown how much can this penalty be. Nevertheless, users have reported performance of ~12.69 ms inference time (no multi-threading) running on the Jetson Xavier AGX (Jetpack 4.2.2 + TensorRT 5) on python. This would imply that the requirement can still be met in spite of the python performance penalty. However, performance differences have also been observed with different TensorRT versions.

Tasks:

Hrayo712 commented 4 years ago

After looking into this , I found that by specifying the GStreamer pipeline, as well as threading the fetching of images from the camera device greatly improves performance. Therefore, this 2 approaches were incorporated, leading the system to reach an average of 100 FPS, as espected.