Closed wangchangquan closed 3 years ago
FPS is not determined by GPU (TOPS) alone. You need to also consider image capturing/decoding, pre-processing, data copying (CPU memory<-> GPU memory), post-processing, video displaying, etc.
You have to do a pretty detailed profiling to identify where the bottleneck is.
Thank you for your reply. There is an nvidia official test result (https://github.com/NVIDIA-AI-IOT/jetson_benchmarks), all the model's speed up to hundred frames.This result should only to the inference speed of GPU. My application is to detect fast move object in real time,We require the detection FPS to reach 120fps.The whole process is that the camera grabs the image and nx processes it immediately.Do you have any good suggestions for speeding up?
You might want to look into the use of NVIDIA DeepStream SDK.
In addition, think about whether you could do batched inference of your workload. And this discussion might be helpful: https://github.com/AlexeyAB/darknet/pull/5453#issuecomment-663593512 and https://github.com/AlexeyAB/darknet/pull/5453#issuecomment-665105334.
I use yolov4-tiny as test model. The FPS is 10fps in tx1,the tx1 is only have 1tops. The xavier nx up to 21tops. so I think the model run in nx could up to 100fps. but I test in xavier nx,it's only get to 20-30fps,even if I change the model to TensorRt ,it's just up to 60fps. the result is same as you. so I want to know why the computing of nx is increased by 10 times,but the speed of inference didn't increase that much?