Confusing output format - Githubissues

marcoslucianops / DeepStream-Yolo

NVIDIA DeepStream SDK 7.1 / 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models

MIT License

1.49k stars 361 forks source link

Confusing output format #347

Closed aidevmin closed 1 year ago

aidevmin commented 1 year ago

I see that output format of yolov7 .engine model as follow ` 0 INPUT kFLOAT input 3x640x640

1 OUTPUT kFLOAT output 25200x7 ` I understood that 25200 = 3x(80^2 + 40^2 + 20^2), but what does it mean for value 7). I know each bbox have x_min, y_min, w, h, class_id, class_prob, total is 6. Please clarify each values in value 7? And size of bounding boxes is relative or absolute values. Thanks

marcoslucianops commented 1 year ago

In the new update, it's 25200x6. The '7' was the objectness (used in many YOLO models), but now it's multiplying the objectness with the scores in the DeepStreamOutput layer (PyTorch output implementation).

marcoslucianops commented 1 year ago

The size of the bboxes are absolute values according to the input size.

aidevmin commented 1 year ago

The size of the bboxes are absolute values according to the input size.

Yeah. Thank you. But output format is (25200, 7). Could you clarify 7 values? Often we have xmin, ymin, w, h, class id, class prob, total is 6.

marcoslucianops commented 1 year ago

In the new update, it's 25200x6. The '7' was the objectness (used in many YOLO models), but now it's multiplying the objectness with the scores in the DeepStreamOutput layer (PyTorch output implementation).

aidevmin commented 1 year ago

In the new update, it's 25200x6. The '7' was the objectness (used in many YOLO models), but now it's multiplying the objectness with the scores in the DeepStreamOutput layer (PyTorch output implementation).

It turn that you again commited today. I ran source yesterday and received (25200, 7). I will check the lastest source code. Thanks

marcoslucianops commented 1 year ago

I changed it again to fix some errors. The output now is: boxes [8400, 4], scores [8400, 1] (best score) and classes [8400, 1] (class id from the best score).

aidevmin commented 1 year ago

@marcoslucianops yeah, I think that it's better to give consistent output format for all YOLO, because we can directly use one postprocessing for all. It is the reason sometimes I need to check output format. It is better if you mention in Readme.md for easy usage.

marcoslucianops commented 1 year ago

In the current version, there is only one output for all of the models:

boxes [8400, 4], scores [8400, 1] (best score) and classes [8400, 1] (class id from the best score)

The only difference is the parser function parse-bbox-func-name (NvDsInferParseYolo and NvDsInferParseYoloE).

The NvDsInferParseYoloE is for PPYOLOE, YOLO-NAS and DAMO-YOLO output and it's [left, right, width, height] in the engine output.
The NvDsInferParseYolo is for the other models output and it's [x_center, y_center, width, height] in the engine output.