jkjung-avt / tensorrt_demos

TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet
https://jkjung-avt.github.io/
MIT License
1.75k stars 547 forks source link

why the inference speed in jetson nx is so slow? #357

Closed wangchangquan closed 3 years ago

wangchangquan commented 3 years ago

I use yolov4-tiny as test model. The FPS is 10fps in tx1,the tx1 is only have 1tops. The xavier nx up to 21tops. so I think the model run in nx could up to 100fps. but I test in xavier nx,it's only get to 20-30fps,even if I change the model to TensorRt ,it's just up to 60fps. the result is same as you. so I want to know why the computing of nx is increased by 10 times,but the speed of inference didn't increase that much?

jkjung-avt commented 3 years ago

FPS is not determined by GPU (TOPS) alone. You need to also consider image capturing/decoding, pre-processing, data copying (CPU memory<-> GPU memory), post-processing, video displaying, etc.

You have to do a pretty detailed profiling to identify where the bottleneck is.

wangchangquan commented 3 years ago

Thank you for your reply. There is an nvidia official test result (https://github.com/NVIDIA-AI-IOT/jetson_benchmarks), all the model's speed up to hundred frames.This result should only to the inference speed of GPU. My application is to detect fast move object in real time,We require the detection FPS to reach 120fps.The whole process is that the camera grabs the image and nx processes it immediately.Do you have any good suggestions for speeding up?

jkjung-avt commented 3 years ago

You might want to look into the use of NVIDIA DeepStream SDK.

In addition, think about whether you could do batched inference of your workload. And this discussion might be helpful: https://github.com/AlexeyAB/darknet/pull/5453#issuecomment-663593512 and https://github.com/AlexeyAB/darknet/pull/5453#issuecomment-665105334.