Low performance - Githubissues

nwesem / mtcnn_facenet_cpp_tensorRT

Face Recognition on NVIDIA Jetson (Nano) using TensorRT

GNU General Public License v3.0

203 stars 72 forks source link

Low performance #21

Open lagurus opened 3 years ago

lagurus commented 3 years ago

Great work, but I am not able to reach desired fps (on Jetson Nano 15fps)

My result are quite low (3fps)

With _LOGTIMES it show: mtCNN took 201ms Forward took 135ms Feature matching took 0ms.

I am using last Jetpack 4.4.1

#jetson_release

NVIDIA Jetson Nano (Developer Kit Version)

Jetpack 4.4.1 [L4T 32.4.4]

NV Power Mode: MAXN - Type: 0

jetson_stats.service: active

Libraries:
- Vulkan: 1.2.70

Can it be connected with: https://github.com/opencv/opencv/issues/18340 or https://forums.developer.nvidia.com/t/darknet-slower-using-jetpack-4-4-cudnn-8-0-0-cuda-10-2-than-jetpack-4-3-cudnn-7-6-3-cuda-10-0/121579

Thanks

nwesem commented 3 years ago

that's interesting... i will look in to that as soon as i can

lagurus commented 3 years ago

Maybe I have realized what all performance problem is all about. I tested with video/picture with 6 faces.

no faces in video - 43fps
1 face - 17fps
3 faces - 8 fps
6 faces - 3 fps

nwesem commented 3 years ago

I think we could batch process the faces and speed it up that way.. I tried doing that but had problems with a dynamic amount of faces as an input to the tensorRT inference. I guess we could set a maximum of e.g. 6 faces and in case there are less faces we could append the remaining matrices of faces as zeros.. That would mean you need to change the tensorRT model of the facenet. but look at the mtCNN already using up 1/5 of a second that means you will not reach more than 5 FPS since I am not sure if the speed of the mtCNN can still be increased..

Tetsujinfr commented 3 years ago

fyi running 22fps with 1 face and no images in the imgs DB folder on a Jetson Xavier NX. Cool repo, but I would have expected faster inference on the NX, although I suspect the fps rate estimate might be capped by 1) the opencv frame window rendering and 2) the webcam frame grab time. I need to play with it a bit more.

Modifying the main.cpp to add some command line arguments, e.g. an input folder to use the face detection piece of the model without the recognition piece. The recognition piece though is really cool and quite robust at first glance.