Open marcoslucianops opened 1 year ago
@marcoslucianops could you please share code to evaluate .engine model?
@marcoslucianops could you please share code to evaluate .engine model?
I will share it in the future.
@marcoslucianops could you please share code to evaluate .engine model?
Do I need to use file "libnvdsinfer_custom_impl_Yolo.so" generated from command "CUDA_VER=11.8 make -C nvdsinfer_custom_impl_Yolo" for evaluation or only use .engine model?
My eval code is created based on deepstream_python_apps with some custom implementations (image batch input, pycocotools, etc). It uses DeepStream to generate the JSON to be evaluated by pycocotools.
My eval code is created based on deepstream_python_apps with some custom implementations (image batch input, pycocotools, etc). It uses DeepStream to generate the JSON to be evaluated by pycocotools.
I inference for each image in COCO val, collect labels to generate json file. But I got low mAP for yolov7 fp32 .engine model:
mAP0.5:0.95 = 0.4 mAP0.5 = 0.538 mAP0.75 = 0.435
It is too low compared to your benchmark, even if you use only yolov6 fp16 .engine model
In the models I've tested, there's no mAP difference between FP32 and FP16 engines. Are you using the DeepStream to output the bboxes?
In the models I've tested, there's no mAP difference between FP32 and FP16 engines. Are you using the DeepStream to output the bboxes?
Yes. I run deepstream app for images and save output (labels) in a file by setting gie-kitti-output-dir
. Then I collected labels and generated json files to evaluate. My mAP is too low.
In the kitti output, the bboxes coordinates are related to the streammux resolution you set. You need to change them according to each validation image resolution.
In the kitti output, the bboxes coordinates are related to the streammux resolution you set. You need to change them according to each validation image resolution.
Yes, I recognized that, and also changed to image size, but mAP is too low.
Did you set
[class-attrs-all]
nms-iou-threshold=0.65
pre-cluster-threshold=0.001
topk=300
In the config_infer_primary_yoloV7.txt file?
Did you set
[class-attrs-all] nms-iou-threshold=0.65 pre-cluster-threshold=0.001 topk=300
In the config_infer_primary_yoloV7.txt file?
Did you use the above config to receive benchmark? I used default set up.
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300
The evaluation uses different NMS and confidence thresholds. Try with the values I sent.
The evaluation uses different NMS and confidence thresholds. Try with the values I sent.
Thanks a lot for supporting me. I am going to try it now😍
Did you set
[class-attrs-all] nms-iou-threshold=0.65 pre-cluster-threshold=0.001 topk=300
In the config_infer_primary_yoloV7.txt file?
I used this set up, mAP is better, but it is still lower than your benchmark for YOLOv7. Here is my result for fp32 .engine model
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.449
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.623
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.485
I attached my config ( I have many config as final_config_1.txt) config_infer_primary_yoloV7.txt final_config_1.txt
My eval code is fine-adjusted for extract the better mAP using DeepStream, that's why I got a bit more mAP.
In the models I've tested, there's no mAP difference between FP32 and FP16 engines. Are you using the DeepStream to output the bboxes?
@marcoslucianops Do You mean Yolov7 model? I saw that your fp16 .engine model has mAP0.5:0.95 = 0.476, it means that mAP0.5:0.95 (of fp32 .engine model) = 0.476. It is too low compared with reference .pt model mAP0.5:0.95 = 0.514 https://github.com/WongKinYiu/yolov7#performance
There's a drop on TensorRT compared to the PyTorch model. In some models, it's a relevant drop. In other models (like PPYOLOE and YOLO-NAS), it's a small. The test I did I was comparing the ONNX export method with the wts
and cfg
export method. There's no drop between those two export methods.
There's a drop on TensorRT compared to the PyTorch model. In some models, it's a relevant drop. In other models (like PPYOLOE and YOLO-NAS), it's a small. The test I did I was comparing the ONNX export method with the
wts
andcfg
export method. There's no drop between those two export methods.
Thanks a lot. I expect fp32 is not drop mAP much. If mAP of fp32 or fp16 drop much, so mAP of int8 is still lower.
The FP16 and FP32 mAP are equal.
The FP16 and FP32 mAP are equal.
Yeah, I think so. In your opinion, what is the reason of fp32, fp16's mAP big drop compared with .pt models? I mean some models included yolov7. I saw that yolov7 fp16 is dropped about 4%.
In my opinion, TensorRT layers are performance focused, making some tweaks to precisions and parameters. So it's faster, but loses some of the accuracy.
In my opinion, TensorRT layers are performance focused, making some tweaks to precisions and parameters. So it's faster, but loses some of the accuracy.
Thanks for sharing.
Could this be related to inputs being different, not only TensorRT tweaks? For instance, in YOLOv8 it looks like symmetric padding is done with a grayscale value rather than with black color like DeepStream's nvstreammux
does.
Edit: I also saw the following warning when running with exported ONNX models. Could this be another reason for the drop in performance? Is it possible to export using INT32 instead of INT64?
WARNING: [TRT]: onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
In any case, it would be good to have a table of the expected drop for each of the models, as a reference.
@cgrtrifork anything update? I have same warning that Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. How to slove the problem.
when I inference yolov8s in Deepstream-6.3 in nvidia agx orin DK, I have some question. fp32 gpu ~30fps fp16 gpu+dla0 ~11fps and it's a relevant drop.
would someone give me some explain and guide ?
I ran the following experiment: I am trying out YOLOv8 object detection on an image that contains an object.
ffmpeg
I generated a single-frame video, that I feed into DeepStream with a confidence threshold pre-cluster-threshold=0.2
.cluster-mode=2
, nms-iou-threshold=0.5
) the object is not found. cluster-mode=4
) then an object is found with confidence 0.77.Having used the same TensorRT model, this makes me think there is an issue either on the parsing and interpretation of the output from the model, or deeper in a lower level DeepStream preprocessing of the image.
nvinfer
plugin and Triton Inference Server? For completeness:
# extract all the frames from the original video into a folder
# frames are enumerated starting from 1
ffmpeg -i original_video.mp4 original_video/%05d.jpg
Then I chose the frame to use (number 84), and I created the single-frame video by doing:
# frames start from 0, that's why we choose 84-1=83
ffmpeg -i original_video.mp4 -vf "select=eq(n\,83)" single_frame_video.mp4
nvurisrcbin
-> videorate
-> nvvideoconvert
-> capsfilter
-> nvstreammux
-> queue
-> nvvideoconvert
-> capsfilter
-> nvinfer
-> fakesink
. I'm adding a probe after the nvinfer
to see the detections.@marcoslucianops have you tried evaluating the engine file outside of DeepStream?
I ran the following experiment: I am trying out YOLOv8 object detection on an image that contains an object.
1. I used this repository to export the model to onnx. Then using `ffmpeg` I generated a single-frame video, that I feed into DeepStream with a confidence threshold `pre-cluster-threshold=0.2`. * When I use NMS clustering (`cluster-mode=2`, `nms-iou-threshold=0.5`) the object is _not_ found. * If I disable the clustering (`cluster-mode=4`) then an object is found with confidence 0.77. 2. I used Triton Inference Server to serve the same TensorRT model that is generated when running DeepStream. Then I ran the inference on the same image. I preprocessed the image to get a 3x640x640 image of float32 between 0 and 1 in RGB format, as it is expected by the model. * When I use gray background for the padding (pixel value = 114/255) —like YOLO does— the max score of the output is 0.87. * When I use black background for the padding (pixel value = 0)—like DeepStream does— the max score of the output is 0.90.
Having used the same TensorRT model, this makes me think there is an issue either on the parsing and interpretation of the output from the model, or deeper in a lower level DeepStream preprocessing of the image.
1. Why does enabling the NMS remove the detection? If the detection is the maximum score the NMS shouldn't remove it. 2. Why are the scores different between DeepStream's `nvinfer` plugin and Triton Inference Server?
For completeness:
* The image I'm using was originally extracted from a video by doing:
# extract all the frames from the original video into a folder # frames are enumerated starting from 1 ffmpeg -i original_video.mp4 original_video/%05d.jpg
Then I chose the frame to use (number 84), and I created the single-frame video by doing:
# frames start from 0, that's why we choose 84-1=83 ffmpeg -i original_video.mp4 -vf "select=eq(n\,83)" single_frame_video.mp4
* The pipeline I'm using in DeepStream is: `nvurisrcbin` -> `videorate` -> `nvvideoconvert` -> `capsfilter` -> `nvstreammux` -> `queue` -> `nvvideoconvert` -> `capsfilter` -> `nvinfer` -> `fakesink`. I'm adding a probe after the `nvinfer` to see the detections.
@marcoslucianops have you tried evaluating the engine file outside of DeepStream?
Following up on this I found out that the parsing from NvDsInferParseYolo
seems to be correct for this case. However, the resulting detection from DeepStream is not the one with the highest confidence. Here you can see the logs from DeepStream —I added print statements to the library:
[Class 0] Box proposal with confidence 0.750208: x1=185.988, y1=141.614, x2=499.038, y2=417.46 (threshold: 0.2)
[Class 0] BBI with confidence 0.750208: left=185.988, top=141.614, width=313.05, height=275.846
[Class 0] Box proposal with confidence 0.881455: x1=184.819, y1=141.771, x2=497.486, y2=416.067 (threshold: 0.2)
[Class 0] BBI with confidence 0.881455: left=184.819, top=141.771, width=312.667, height=274.296
[Class 0] Box proposal with confidence 0.886627: x1=185.479, y1=141.421, x2=499.159, y2=415.547 (threshold: 0.2)
[Class 0] BBI with confidence 0.886627: left=185.479, top=141.421, width=313.68, height=274.127
[Class 0] Box proposal with confidence 0.877862: x1=185.409, y1=141.396, x2=499.173, y2=415.735 (threshold: 0.2)
[Class 0] BBI with confidence 0.877862: left=185.409, top=141.396, width=313.764, height=274.339
[Class 0] Box proposal with confidence 0.866284: x1=184.766, y1=141.94, x2=497.723, y2=416.012 (threshold: 0.2)
[Class 0] BBI with confidence 0.866284: left=184.766, top=141.94, width=312.958, height=274.072
[Class 0] Box proposal with confidence 0.854519: x1=184.601, y1=141.577, x2=499.699, y2=415.742 (threshold: 0.2)
[Class 0] BBI with confidence 0.854519: left=184.601, top=141.577, width=315.097, height=274.165
[Class 0] Box proposal with confidence 0.856617: x1=185.726, y1=141.448, x2=499.246, y2=415.667 (threshold: 0.2)
[Class 0] BBI with confidence 0.856617: left=185.726, top=141.448, width=313.52, height=274.219
[Class 0] Box proposal with confidence 0.770557: x1=184.458, y1=142.046, x2=498.037, y2=416.368 (threshold: 0.2)
[Class 0] BBI with confidence 0.770557: left=184.458, top=142.046, width=313.579, height=274.322
[Class 0] Box proposal with confidence 0.752778: x1=184.416, y1=141.868, x2=499.955, y2=416.512 (threshold: 0.2)
[Class 0] BBI with confidence 0.752778: left=184.416, top=141.868, width=315.539, height=274.644
[Class 0] Box proposal with confidence 0.725658: x1=185.231, y1=141.948, x2=499.762, y2=416.444 (threshold: 0.2)
[Class 0] BBI with confidence 0.725658: left=185.231, top=141.948, width=314.531, height=274.496
[Class 0] Box proposal with confidence 0.23098: x1=184.02, y1=141.643, x2=500.4, y2=416.408 (threshold: 0.2)
[Class 0] BBI with confidence 0.23098: left=184.02, top=141.643, width=316.38, height=274.764
Objects decoded: 11
ObjectList after assignment: 11
2024-02-12 14:21:33,705 [INFO][root] Frame number: 0
2024-02-12 14:21:33,705 [INFO][root] [Class 0] Found object with confidence = 0.7705574035644531: left=332.0246276855469, top=0.0827464759349823, width=564.4418334960938, height=494.5528259277344
The DeepStream version I'm using is 6.2, I will test this in newer versions too.
EDIT: It seems to be fixed when upgrading to DeepStream 6.3, now all the detections are found if NMS is disabled, and only the correct maximum confidence detection is found when using NMS.
I evaluated the mAP between get_wts model and ONNX model and both faced accuracy drop on TensorRT conversion. The conclusion is that the TensorRT drops the accuracy when optimizing the layers.
YOLOv8n ONNX:
YOLOv8n get_wts_yolov8.py