Two question about the dataset processing

mks0601 / Hand4Whole_RELEASE

Official PyTorch implementation of "Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation", CVPRW 2022 (Oral.)

MIT License

308 stars 30 forks source link

Two question about the dataset processing #36

Closed linjing7 closed 2 years ago

linjing7 commented 2 years ago

Hi, thanks for your excellent work! I have two questions about the dataset:

What's the meaning of extension_ratio in the following code and why it differs in the case of hand (=2.0) and face (=1.5)?

bbox[:,2] = w*extension_ratio
bbox[:,3] = h*extension_ratio
bbox[:,0] = c_x - bbox[:,2]/2.
bbox[:,1] = c_y - bbox[:,3]/2.

When calculating the projection loss, why do you need to change hand projected joint coordinates according to hand bbox (cfg.output_hm_shape -> hand bbox space)? Is it designed to prevent the projection loss in the face and hand parts from being ignored due to the small resolution of these two parts in the output_hm_shape space?

mks0601 commented 2 years ago

Usually, hand box detection is much more difficult than the face box detection as hands are often occluded. Hence, I enlarged the hand box size more than face to prevent hand detection missing.
In the dataloader, all GT 2D joint coordinates are defined in the downsampled body image space. However, during forward, all the estimated 2D hand joint coordinates are defined in the cropped and downsampled hand-only image space. Hence, we should apply an affine transform that transforms downsampled body image space -> downsampled hand-only image space.

linjing7 commented 2 years ago

Okay, thanks for your patient reply.

In A2, you mention that "all the estimated 2D hand joint coordinates are defined in the cropped and downsampled hand-only image space".

Yes, lhand_joint_img is in hand-only image space. But it seems that joint_proj['lhand'] is already in downsampled body image shape? So why not compute the projection loss with GT 2D joint coordinates in the downsampled body image space? joint_proj, joint_cam, mesh_cam = self.get_coord(root_pose, body_pose, lhand_pose, rhand_pose, jaw_pose, shape, expr, cam_trans, mode)

mks0601 commented 2 years ago

joint_proj['lhand'] is not in the downsampled body image space. It is in the cropped and downsampled hand-only image space.

linjing7 commented 2 years ago

Okay, I'll double-check. Thank you very much for your patient reply!