mks0601 / 3DMPPE_ROOTNET_RELEASE

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019
MIT License
477 stars 65 forks source link

Why enlarge the bbox? #29

Closed rajatongit closed 3 years ago

rajatongit commented 3 years ago

Hi, thank you for providing the code and such an insightful paper. I have a doubt, in line 63 you enlarge the bounding box by 25% in both directions. However, before that you sanitize the bounding box to ensure that the [xmin, ymin, xmax, ymax] are all within the image dimensions. Is it possible that after enlarging our (xmin, ymin) or (xmax, ymax) might go outside image dimensions?

https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE/blob/8bef0cd332c3423050a6f3b382d2a574623e1ffa/common/utils/pose_utils.py#L63

Secondly, I have a basic question: Are we preserving the aspect ratio as 1:1 of the bounding boxes, because the ratio of the human is assumed to be 2000mmx2000mm?

mks0601 commented 3 years ago

Hi,

For the first question, yes it can exceed the image size. But here and here can handle that issue.

For the second question, yes.

rajatongit commented 3 years ago

@mks0601 Thank you very much for your response. :) Just as a small follow up, I notice that you mention in the ReadMe that we must use tight(not extended) bounding boxes during evaluation on different datasets. Then, why do we intentionally enlarge the bounding box in the above code?

mks0601 commented 3 years ago

Because the codes will automatically enlarge the box. If you fed enlarged box, the box will be enlarged two times.

rajatongit commented 3 years ago

True, thank you for the clarification. Sorry to take the issue one step further, and may be it is a rookie question: what is the motivation behind enlarging the bounding box? I could imagine, that may be the tight bounding boxes are so tight that the hair is not included or some clothing material (like hats haha or shoes) is not included within it. I don't know if I am thinking in right direction, but what is your motivation to enlarge the boxes?

mks0601 commented 3 years ago

This is a kind of convention in human pose estimation. I think it is to make sure the body part is not deleted after scale/rotation augmentation?

rajatongit commented 3 years ago

oh good to know. Yes, that seems more plausible reason than what I said! Thank you, closing the issue now! 👍