Incorrect Keypoint and Bounding Box Outputs with RetinaFace Custom Parser in DeepStream 6.3

sowmiya-masterworks commented 3 months ago

I'm using the RetinaFace custom parser from the face-recognition-deepstream repo and encountering several issues concerning the output on keypoint detection and bounding box accuracy, using DeepStream 6.3.

Environment Hardware: NVIDIA GeForce RTX 3060 Driver Version: 555.42.06 CUDA Version: 12.5 DeepStream Version: 6.3 Operating System: [Please specify your OS, e.g., Ubuntu 20.04] Test Applications: DeepStream Test5 (both C++ and Python versions) Models and Weights Models Tried: Both ResNet50 and MobileNet architectures were tested. Weights: The model weights used are sourced from the [Pytorch_Retinaface repository]https://github.com/biubug6/Pytorch_Retinaface Expected Behavior Accurate detection and output of face landmarks and bounding boxes in the video stream.

Actual Behavior Landmarks: Many keypoint coordinates are either zero or negative, which does not correspond to valid pixel values. Bounding Boxes: Outputs are often unrealistic (e.g., exceedingly large dimensions). Video Output: No detections appear in the output video. Steps to Reproduce Tested the setup using the DeepStream Test5 application for both primary inference engine (pgie) and secondary inference engine (sgie). Also tested using the Python3 main.py script provided in the repository. In all tests, inappropriate bounding boxes and keypoints were observed across different setups and models. Console OutputRaw output array: output[0] = 0.937012 output[1] = 1.41797 output[2] = -1.57422 output[3] = -0.820801 output[4] = 1.58301 output[5] = 0.605469 output[6] = -0.943848 output[7] = -0.15332 output[8] = -1.27832 output[9] = 1.35547 output[10] = -0.252197 output[11] = -0.817383 output[12] = 0.0300903 output[13] = 1.14355 output[14] = -0.243774 output[15] = -0.335449 Raw output array: output[0] = 0.937012 output[1] = 1.41797 output[2] = -1.57422 output[3] = -0.820801 output[4] = 1.58301 output[5] = 0.605469 output[6] = -0.943848 output[7] = -0.15332 output[8] = -1.27832 output[9] = 1.35547 output[10] = -0.252197 output[11] = -0.817383 output[12] = 0.0300903 output[13] = 1.14355 output[14] = -0.243774 output[15] = -0.335449 Clipped BBox: 1.41797, 0, 0, 1.58301 Detection: Top: 0 Left: 1 Width: 4.29497e+09 Height: 1 Confidence: 0.605469 Landmarks: 0 0 -1 1 0 0 0 1 0 0 Raw output array: output[0] = 0.935547 output[1] = 1.41797 output[2] = -1.57324 output[3] = -0.819336 output[4] = 1.58203 output[5] = 0.60498 output[6] = -0.943359 output[7] = -0.15332 output[8] = -1.27734 output[9] = 1.35352 output[10] = -0.25293 output[11] = -0.816895 output[12] = 0.0317383 output[13] = 1.14355 output[14] = -0.243042 output[15] = -0.334473 Raw output array: output[0] = 0.935547 output[1] = 1.41797 output[2] = -1.57324 output[3] = -0.819336 output[4] = 1.58203 output[5] = 0.60498 output[6] = -0.943359 output[7] = -0.15332 output[8] = -1.27734 output[9] = 1.35352 output[10] = -0.25293 output[11] = -0.816895 output[12] = 0.0317383 output[13] = 1.14355 output[14] = -0.243042 output[15] = -0.334473 Clipped BBox: 1.41797, 0, 0, 1.58203 Detection: Top: 0 Left: 1 Width: 4.29497e+09 Height: 1 Confidence: 0.60498 Landmarks: 0 0 -1 1 0 0 0 1 0 0

Athuliva commented 3 months ago

with retinaface resnet50 i am getting correct bounding box. can you tell me how u generated engine file from https://github.com/biubug6/Pytorch_Retinaface?

sowmiya-masterworks commented 3 months ago

@Athuliva https://github.com/biubug6/Pytorch_Retinaface/blob/master/convert_to_onnx.py this script for onxx converstion and for engine conversion: /usr/src/tensorrt/bin/trtexec --onnx=FaceDetector.onnx --explicitBatch --workspace=204 --saveEngine=FaceDetector.engine --fp16 in docker deepstream 6.3

sowmiya-masterworks commented 3 months ago

@Athuliva https://github.com/wang-xinyu/tensorrtx/tree/master/retinaface while using this way generated engine file, i was facing error when using with the deep stream test5 application!

Athuliva commented 3 months ago

@sowmiya-masterworks have u loaded the libdecodeplugin.so while using https://github.com/wang-xinyu/tensorrtx/tree/master/retinaface

import ctypes
ctypes.cdll.LoadLibrary('/VA/retinaface_r50_63/R50/libdecodeplugin.so')

zhouyuchong commented 3 months ago

@sowmiya-masterworks try this, BTW, how do you decode those bbox and lmks? Since Retinaface is an anchor based model, the raw outputs should be post-processed otherwise they are unreadable.

sowmiya-masterworks commented 3 months ago

@zhouyuchong, thanks for the suggestion! Could you provide some guidance or recommend a repository for decoding the bounding boxes and landmarks from RetinaFace inside the DeepStream environment? Since it's an anchor-based model, I understand that the raw outputs need post-processing to be interpretable, and any pointers on how to approach this within DeepStream would be greatly appreciated.

zhouyuchong commented 3 months ago

@sowmiya-masterworks cpp version if you use custom-lib-path in nvinfer config path, note there is no support for landmarks in official datastructure. python version post-process. if you want to apply it to deepstream, just get raw outputs which I think you already knew, then do post-process in gst-probe callback function.

zhouyuchong / face-recognition-deepstream

Incorrect Keypoint and Bounding Box Outputs with RetinaFace Custom Parser in DeepStream 6.3 #36