open-mmlab / mmdeploy

OpenMMLab Model Deployment Framework
https://mmdeploy.readthedocs.io/en/latest/
Apache License 2.0
2.7k stars 618 forks source link

[Bug] Bounding box coordinates returned by the pose tracker API are not scaled correctly #2805

Open wbudd opened 1 month ago

wbudd commented 1 month ago

Checklist

Describe the bug

Pose tracking results returned include both an array of pose detections and an array of the bounding boxes within which those poses were detected.

However, even though the returned coordinates of the pose detections are scaled correctly in relation to the original input image, the same is not done for the returned bounding boxes.

Furthermore, this seems difficult to work around by the API consumer, because intermediate image sizes and/or scale factors do not seemed to be exposed by the API. In other words, the API consumer is given no clue what the bounding box coordinates need to be multiplied with to get dimensions that make sense in relation to the original input image.

Reproduction

Simply run the mmdeploy-provided pose tracker demo for Python, or any other supported language. As input image, use one or more small image(s) (640x480 or less) clearly depicting a person. The bounding box's bottom right x/y will be larger than the width/height of the entire input image.

Environment

The mmdeploy Docker image provided by mmdeploy.

mmdet model used is `rtmdet-nano-ort`.
mmpose model used is `rtmw-m-trt-fp16`.

Both with default configurations, as generated by mmdeploy in accordance with the official docs.

(I suspect this bug applies to all or most model combinations, but I've only tested the above configuration.)

Error traceback

No response

wbudd commented 1 month ago

Looking a bit more at the mmdeploy code I see that the bounding boxes that the API returns are the result of affine transforms specified in the pose model's pipeline.json (taskstransforms"type": "TopDownAffine"), with this being the corresponding code: https://github.com/open-mmlab/mmdeploy/blob/main/csrc/mmdeploy/codebase/mmpose/topdown_affine.cpp

The problem is that those resized bounding boxes are only relevant as input preparation for the pose model, and not at all useful for API consumer who only knows about the original image size and thus should rather receive the bounding box dimensions produced as output from the object detector model. The pose tracker API should preserve this information for the final result, but fails to do so.