microsoft / MeshTransformer

Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"
https://arxiv.org/abs/2012.09760
MIT License
614 stars 95 forks source link

bounding boxes #54

Closed mahsaep closed 2 years ago

mahsaep commented 2 years ago

Hi and thanks for releasing the code!

From the visualizations that I generated during training an inference, I realised that the box that is used to cropped the image to obtain a person which is given as input to the model at train and test time is not tight around that person and sometimes include other people in the image as well. Could you please let me know the reason? Thanks!

kevinlin311tw commented 2 years ago

During our GT preparation, we did bbox expansion by a factor of 1.2 (you could check relevant code here). So that for inference, the bboxes should be slightly larger than the original GT bbox. Sometimes you may see a larger bbox that include other people. This scaling process is adopted from GraphCMR (CVPR 2019) implementation.

Further, during the training, we perform training data augmentation which will randomly scale the bbox a bit larger/smaller. Thus, the bbox may not be tight around the person.

kevinlin311tw commented 2 years ago

Please re-open this thread if you encounter relevant issues or bugs. Thank you.