mks0601 / I2L-MeshNet_RELEASE

Official PyTorch implementation of "I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image", ECCV 2020
MIT License
720 stars 128 forks source link

Doubt regarding projection of mesh onto image #73

Closed Shubhendu-Jena closed 3 years ago

Shubhendu-Jena commented 3 years ago

Hi again,

Thanks for your work. I wanted to project the mesh onto the image. However, since the ground truth meshes (targets['fit_mesh_cam']) are root-relative, I noticed that in "demo.py", you use the root depth obtained from rootnet to make sure the projected mesh aligns with the image. In your other work Pose2Pose, for Positional pose-guided pooling, you obtain the joint features using bilinear interpolation on the image feature map. For doing this, the predicted joints have to be aligned with the image. How did you manage this? I am essentially trying to project the ground truth meshes onto the image without using the root depth from rootnet. Is this possible?

Thanks in advance

mks0601 commented 3 years ago

The x- and y-axis of the positional pose in Pose2Pose are defined in image space. Therefore, there the positional pose can be used to extract a feature vector on the feature map.

Shubhendu-Jena commented 3 years ago

Thank you for the prompt response. However, there are a few things that are still unclear to me. For the x- and y-axis of the positional pose in Pose2Pose to be defined in image space, they have to be supervised by ground truth which is also in the image space. In the annotation you've provided with I2l-Meshnet, is targets{'fit_joint_img'} aligned with the image space and hence, could I obtain aggregated joint features, in the same way, using bilinear interpolation? Hope the above statements make sense.

mks0601 commented 3 years ago

Yes. targets{'fit_joint_img'} is aligned with the image space. You can visualize it for the debugging purpose.

Shubhendu-Jena commented 3 years ago

Thank you for the response. I tried to shade the image area corresponding to targets{'fit_mesh_img'} and got this : projected_mesh1 For some reason, the ground truth mesh seems to be inverted. I'd be grateful if you could help me understand why this is the case.

mks0601 commented 3 years ago

There is no visualization problem on my side. Could you provide your mesh visualization code?

Shubhendu-Jena commented 3 years ago

I didn't do mesh visualization as such. I just took the x and y coordinates from targets{'fit_mesh_img'} and assigned the corresponding locations in the input image after augmentation 0. The saved image then looks like above. The purpose of this was to verify if the x and y coordinates of targets{'fit_mesh_img'} are aligned with the augmented image.

mks0601 commented 3 years ago

Please see here. Current code can overlay fit_mesh_img to image correctly. Please see demo.py.

Shubhendu-Jena commented 3 years ago

Hi. I tried projecting the mesh the way it has been done in demo.py. I get the results as follows :

output_mesh_lixel

Is there still a problem or is the projection meant to be slightly off?

mks0601 commented 3 years ago

Please run the demo code and check the visualized mesh is correct. If the demo code runs correct, then, your mesh projecting or any other function seems not working properly.

Shubhendu-Jena commented 3 years ago

Yes, I figured out the mistake. Apologies for so many questions and thank you for the prompt responses. Closing issue now

rawalkhirodkar commented 3 years ago

@Shubhendu-Jena can you please share your code? I am trying to visualize the groundtruth on the images. This seems to very off.

mesh_lixel_img = targets['fit_mesh_img'][0].cpu().numpy() ## 6890 x 3

# restore mesh_lixel_img to original image space and continuous depth space
mesh_lixel_img[:,0] = mesh_lixel_img[:,0] / cfg.output_hm_shape[2] * cfg.input_img_shape[1]
mesh_lixel_img[:,1] = mesh_lixel_img[:,1] / cfg.output_hm_shape[1] * cfg.input_img_shape[0]
mesh_lixel_img[:,2] = (mesh_lixel_img[:,2] / cfg.output_hm_shape[0] * 2. - 1) * (cfg.bbox_3d_size / 2)

raw_img = (255*img.copy())[...,::-1]
mesh_img = vis_mesh(raw_img.copy(), mesh_lixel_img)

EDIT: This works for full body humans but not for crops (zoomed in images). I guess this by design. image