mks0601 / I2L-MeshNet_RELEASE

Official PyTorch implementation of "I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image", ECCV 2020
MIT License
724 stars 127 forks source link

Generate parsed data #88

Closed pangyyyyy closed 3 years ago

pangyyyyy commented 3 years ago

Hi, thanks for the great work! I'd like to check how did you generate the parsed data from the original datasets? And if you mind providing the codes for them?

Thank you!!

mks0601 commented 3 years ago

which datasets do you want?

pangyyyyy commented 3 years ago

Hello! Thanks for your reply

I'm keen to understand how you generate the parsed data esp for 3DPW and MuCo datasets. Would you be willing to share the preprocessing steps/scripts for these datasets?

mks0601 commented 3 years ago

I'm afraid I cannot share the codes, but it is not very difficult to generate such json files. You can open the json files and understand the dictionary structure of them.

pangyyyyy commented 3 years ago

No worries! I tried to derive the same values as your 3DPW parsed data but I was unable to obtain the same values for fitted_joint_cam. You mentioned in your readme that the ‘fitted_joint_cam is 24x3 joint coordinates (SMPL joint set) in camera-centered 3D space (meter unit), which is regressed from the SMPL mesh.

I have a few questions: (1) What are your inputs into the SMPL regressor? (2) Did you use a gendered or gender-neutral regressor? (3) Was any translation applied? (4) Did you offset the SMPL outputs by root coordinates obtained from RootNet?

Hope to get your advice on this, thanks!

mks0601 commented 3 years ago

I forward SMPL parameters, provided in original 3DPW dataset. However, translation vectors in the original 3DPW are incorrect. Therefore, I performed rigid alignment the output of SMPL parameters to 'jointPositions', provided in 3DPW dataset. I used a neutral gender.

pangyyyyy commented 3 years ago

Thanks for the reply! Do you mind elaborating on how you performed "rigid alignment the output of SMPL parameters to 'jointPositions'"?

I forwarded the SMPL betas and poses and used a gender neutral model. I tried to align my SMPL outputs using the root_coordinate of the jointPositions, provided in 3DPW dataset, but I still got very different values from fitted_joint_cam. I looked at the root joint coordinates obtained from RootNet and aligning the SMPL outputs to the root coordinate provided seems to give similar values to those in fitted_joint_cam.

For instance:

For annotation id: 32664

Actual offset needed to be applied to obtain `fitted_joint_cam` values from derived SMPL outputs: [1.1488, 0.2267, 5.6820] 
Root coordinate of `jointPositions`: [2.9659, -0.3631, -5.8923]
Root cam coordinate obtained from RootNet:  [1.1495038270950317, 0.28018078207969666, 5.689423084259033]
mks0601 commented 3 years ago

You should also consider camera extrinsics. Here is the rigid alignment codes, which get 3D rotational matrix (include rotation/scale/translation).

pangyyyyy commented 3 years ago

I performed rigid alignment of SMPL outputs to jointPositions but was still unable to get fitted_joint_cam values

Sorry for the trouble, can I check if this is the correct procedure?

For annotation id: 32664

keypoints_3d = data['jointPositions'][0][0].reshape(24, 3) # 3D keypoints
smpl_joints = np.array(output.joints).reshape(24, 3) # SMPL outputs
aligned_joints = rigid_align(smpl_joints, keypoints_3d) # align SMPL outputs to 3D keypoints using provided code

aligned_joints:
array([[ 2.96633185, -0.35136265, -5.88738065],
       [ 3.0315059 , -0.44573565, -5.85921113],
       [ 2.90759431, -0.44275249, -5.92828365],
       [ 2.97852218, -0.2397811 , -5.90929941],
       .....

fitted_joint_cam:
array([[ 1.14881543,  0.2267409 ,  5.68205287],
       [ 1.21106176,  0.3300526 ,  5.6515299 ],
       [ 1.09256051,  0.31742936,  5.74177306],
       [ 1.16648478,  0.10804464,  5.69278809],
       ....

smpl_joints:
array([[ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 6.22463301e-02,  1.03311732e-01, -3.05229668e-02],
       [-5.62549196e-02,  9.06884819e-02,  5.97201772e-02],
       [ 1.76693555e-02, -1.18696287e-01,  1.07352212e-02], 
       ....

keypoints_3d:
array([[ 2.96588857, -0.36313492, -5.89231832],
       [ 3.02422186, -0.4446724 , -5.8830483 ],
       [ 2.92139617, -0.45167833, -5.93622714],
       [ 2.98745489, -0.23942477, -5.91659344],
       .....

fitted_joint_cam - fitted_joint_cam[0] = smpl_joints I initially thought that the problem might lie in finding the correct root coordinates. I was wondering if the root coordinates from RootNet were used?

mks0601 commented 3 years ago

The root coordinates from RootNet were predictions, so not proper to use them as GTs (json files include GTs). jointPositions are in the world coordinate system, so you should apply camera extrinsics ('cam_poses') to make them in the camera-centered coordiante system.

Please try projecting coordiantes into image space and visualize them for the debugging.

pangyyyyy commented 3 years ago

OK thanks for the clarification!

I obtained R, t from camera extrinsics and transformed jointPositions to cam coordinate system. Applying rigid alignment to the SMPL outputs and cam-aligned keypoints give an output close to (but not exactly the same as) the "g-t" fitted_joint_cam.

array([[ 1.14638129,  0.21608887,  5.68087656],
       [ 1.20532955,  0.31419452,  5.65175911],
       [ 1.09309508,  0.30212931,  5.73773498],
       [ 1.16324552,  0.10342685,  5.69099316],
       ....

I will try to visualise them for further debugging, thanks!

mks0601 commented 3 years ago

Sure :)