Open wbudd opened 1 month ago
Looking a bit more at the mmdeploy code I see that the bounding boxes that the API returns are the result of affine transforms specified in the pose model's pipeline.json
(tasks
→transforms
→"type": "TopDownAffine"
), with this being the corresponding code: https://github.com/open-mmlab/mmdeploy/blob/main/csrc/mmdeploy/codebase/mmpose/topdown_affine.cpp
The problem is that those resized bounding boxes are only relevant as input preparation for the pose model, and not at all useful for API consumer who only knows about the original image size and thus should rather receive the bounding box dimensions produced as output from the object detector model. The pose tracker API should preserve this information for the final result, but fails to do so.
Checklist
Describe the bug
Pose tracking results returned include both an array of pose detections and an array of the bounding boxes within which those poses were detected.
However, even though the returned coordinates of the pose detections are scaled correctly in relation to the original input image, the same is not done for the returned bounding boxes.
Furthermore, this seems difficult to work around by the API consumer, because intermediate image sizes and/or scale factors do not seemed to be exposed by the API. In other words, the API consumer is given no clue what the bounding box coordinates need to be multiplied with to get dimensions that make sense in relation to the original input image.
Reproduction
Simply run the mmdeploy-provided pose tracker demo for Python, or any other supported language. As input image, use one or more small image(s) (640x480 or less) clearly depicting a person. The bounding box's bottom right x/y will be larger than the width/height of the entire input image.
Environment
Error traceback
No response