mks0601 / Hand4Whole_RELEASE

Official PyTorch implementation of "Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation", CVPRW 2022 (Oral.)
MIT License
309 stars 31 forks source link

Coordinate inconsistence between face projected joint and face target joint? #68

Open linjing7 opened 1 year ago

linjing7 commented 1 year ago

Hi, thanks for your excellent work. I find that when computing the joint projection loss, the coordinates of the face projected joint and face target joint are not consistent. The face projected joint joint_proj[:,smpl_x.joint_part['face'],:] is changed to face_bbox_shape space in this part. But the face target joint targets['joint_img'][:,smpl_x.joint_part['face'],:] is still in the output_hm_shape space. Is it right?

image
linjing7 commented 1 year ago

It seems that the face projected joint coordinates are not changed to the face bbox space here. You only perform a global translation alignment. Could you please tell me why do you perform the global translation alignment?

mks0601 commented 1 year ago

Hi, thanks for your question. I think you're right. I actually have no idea why I did not change the face joint coordinates to the face bbox space. I think there were some mistakes on my side :( There should be some codes like this to change the target coordinates to the face bbox space..

linjing7 commented 1 year ago

Hi, thanks for your reply. And I find a global translation alignment of the hand projected joint here. Could you please tell me why do we need the global translation alignment?

image
mks0601 commented 1 year ago

This is not experimentally validated, but my thought was the global position of hand joints in whole body case would not accurate often because hands are at the end of human kinematic chain. Accumulated 3D rotation errors can make global position of hands often wrong. Even in this case, I wanted to make finger poses correct, which is the reason of the global translation alignment.

linjing7 commented 1 year ago

Okay, thanks for your patient reply. It makes sense! I wonder in NeuralAnnot, do you also perform this global translation operation? I'm worried that this would make the global position of the estimated hand joints is not accurate. When I visualize the result on the EHF dataset, I find that the local pose of the hand is rather accurate, but it is not well aligned with the hand in the RGB image. It may be caused by this operation?

image
mks0601 commented 1 year ago

For the whole-body NeuralAnnot, I just copied finger pose from hand-only NeuralAnnot to the smplx NeuralAnnot, so I didn't do the global translation. There would be some trade-off if you turn off the global translation. The global position of hand would become better, but the finger pose accuracy might drop. I think that should be invested by some experiments. If you're going to do that, could you also let me know your conclusion? Thanks!

linjing7 commented 1 year ago

Hi, I think your copy-paste approach is equivalent to fitting the whole-body pose with SMPLX model, and then aligning the predicted wrist joint with the GT wrist joint before computing the hand joint projection loss. Is it right? I'll conduct experiments and if I find some interesting conclusions, I'll share them with you :)

mks0601 commented 1 year ago

I guess so. Looking forward to your updates!