mks0601 / 3DMPPE_ROOTNET_RELEASE

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019
MIT License
477 stars 65 forks source link

Measurement of bbox_real #37

Closed LeftAttention closed 2 years ago

LeftAttention commented 2 years ago

Congratulation for the paper.

I am working on the pose correction due to camera parameters like camera angle, perspective transforms etc. I would like to use this algorithm. I have few questions.

In this line you have mentioned bbox_real = (2000, 2000). In wild image how we can measure this?

Thanks in advance.

mks0601 commented 2 years ago

bbox_real is a constant. We use RootNet to refine that constant.

LeftAttention commented 2 years ago

Thanks for the quick reply.

For a straight image like this. I am getting the following 3-D pose, which is not seems to be good. Please find the 3D poses below.

Figure_1 Figure_2 Figure_3 Figure_4

Thanks.

LeftAttention commented 2 years ago

For person bounding detection I am using efficientDet. so for this case I got [ 234.44705, 768.4173 , 1783.121 , 2857.573 ] which are [xmin, ymin, xmax, ymax].

And after running the rootnet with the following parameters. I got tensor([[ 32.3420, 32.2337, 1324.3123]], device='cuda:0') which are x,y: pixel, z: Foot-relative depth (mm).

For the rootnet, the camera parameters are

focal = [1500, 1500] # x-axis, y-axis
princpt = [original_img_width/2, original_img_height/2] # x-axis, y-axis

I used the same config for the posenet demo as well.

Please guide me how I need to modify the configs. Thanks in advance.

mks0601 commented 2 years ago

If the bounding box tightly catch human area, I guess there would be no problem. Could you visualize the bounding box?

LeftAttention commented 2 years ago

Yes the bounding box is fitting almost tightly. Did I miss anything?

LeftAttention commented 2 years ago

Any suggestions how I need to modify the camera parameters to test wild images. Thanks.

mks0601 commented 2 years ago

You do not have to change the camera parameters. The output 3D pose is in the focal length [1500,1500] space. I guess there is no mistake on your side, and I think the output 3D pose isn't very wrong. Z-axis represents distance from the camera perpendicular to the image center. I somehow seems that the upperbody of the person in the picture is closer to the camera than the lowerbody?

liamsun2019 commented 2 years ago

@LeftAttention Hi, have you resolved this issue? It looks not good based on your results.

vigneshrk29 commented 1 year ago

Hi,

Did you get it working? If yes, did you change your bounding box to [xmin,ymin, width, height]? If yes, how did you calculate height and width?

Thanks