hongsukchoi / 3DCrowdNet_RELEASE

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022
MIT License
155 stars 15 forks source link

How to predict the whole person in the image? #14

Closed xljh0520 closed 1 year ago

xljh0520 commented 2 years ago

Hi, thanks for sharing your code. I notice that this model inputs the cropped and resized image and is trained to predict SMPL parameters and camera parameters once a person. As a result, if there's more than one person in the image, we detect the human and crop the image with human detection results. I'm wondering how to input the original image without cropping. However, I got a few questions in dealing with the dataset. Could you help me with it?

  1. For the camera parameters, Do I need to predict the camera parameters per person or image? (A image may have many persons, and I don't decide to crop the image.)
  2. Which key points in targets need to be changed?
hongsukchoi commented 2 years ago
  1. Yes. The camera parameters are actually the translation of each person in the camera frame.

  2. I don’t understand. 3DCrowdNet is a top down method and crop is required to get proper image features