mks0601 / I2L-MeshNet_RELEASE

Official PyTorch implementation of "I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image", ECCV 2020
MIT License
708 stars 130 forks source link

Issue about h36m_smpl dataset #130

Closed YHaooo-4508 closed 1 month ago

YHaooo-4508 commented 1 month ago

I have been following a lot of your work related to the human body, and they are all great. I have some issues about the h36m_smpl dataset.

I downloaded the H36M_SMPL_DATA and the H36M_SMPLX_DATA from the link you shared. But I found that SMPLX data is continuous frame data, while the frame interval of SMPL data is 5.

Could you please share a continuous frame of H36M_SMPL_DATA or just teach me how to generate such data. Looking forward to your reply!

mks0601 commented 1 month ago

You can use this https://drive.google.com/drive/folders/1ySxiuTCSdUEqbgTcx7bx02uMglPOkKjc

YHaooo-4508 commented 1 month ago

You can use this https://drive.google.com/drive/folders/1ySxiuTCSdUEqbgTcx7bx02uMglPOkKjc

This is exactly what I need, thank you for your sharing!

YHaooo-4508 commented 1 month ago

You can use this https://drive.google.com/drive/folders/1ySxiuTCSdUEqbgTcx7bx02uMglPOkKjc

Sorry to bother you again.

I generated vertexes and 3Dkeypoints from SMPL_param. I think the vertexes and 3Dkeypoints is in the world coordinate system. So i got the h36m cam_param from the link you shared. With the R and t, I got the coord at the camera scale. But I found the coord_cam is wrong. And i fonud the 't' is different with the h36m_cam_param i used before('R' is the same).

In subject1_cam_1, the 't' from your data is [-346.05078140028075, 546.9807793144001, 5474.481087434061] while the 't' from other data is [1841.10702774543, 4955.28462344526, 1563.4453958977]

My partial code is as follows: `output = smpl_model(pose_axis_angle=thetas[:, 3:], betas=betas, global_orient=global_orient, transl=None, return_verts=True) vertices = output.vertices.detach().cpu().numpy().squeeze() 1000 joints_fk = output.joints.detach().cpu().numpy().squeeze() 1000 joints_regressor24 = output.joints_from_verts.detach().cpu().numpy().squeeze() * 1000 extra_points = vertices[special_vertices]

joints_36_world = np.concatenate([joints_regressor24, extra_points]) joints_17_raw = np.array(kpt3d_data[action_id][subaction_id][frame]) root_coord = joints_17_raw[0] joints_36_world += root_coord joints_36_cam = np.zeros_like(joints_36_world) for j in range(joints_36_world.shape[0]): joints_36_cam[j] = world2cam(joints_36_world[j], R, t) joints_36_img = cam2pixel(joints_36_cam, f, c)`

mks0601 commented 1 month ago

why do you do this? joints_36_world += root_coord

YHaooo-4508 commented 1 month ago

why do you do this? joints_36_world += root_coord

In smpl_model() , the obtained vertices and keypoints are relative to the root point I don't know if smpl_data is aligned with the world coordinate system of the h36m dataset, so when processing smpl data, I subtracted the smpl_root_coordinate.

vertices = vertices - joints_from_verts_h36m[:, self.root_idx_17, :].unsqueeze(1).detach()
joints = joints - joints[:, self.root_idx_smpl, :].unsqueeze(1).detach()
joints_from_verts_h36m = joints_from_verts_h36m - joints_from_verts_h36m[:, self.root_idx_17, :].unsqueeze(1).detach()

So In joints_36_world += root_coord, joints_36_world is relative to the root_smpl_point,root_coord is from Human36M_subject1_data.json, the cam_param is from Human36M_subject1_camera.json

I want to extract the keypoint coordinates from SMPL data and use cam_parameters to get keypoint_cam and keypoint_img.

mks0601 commented 1 month ago

If you're using my json files, first, forward everything including pose/shape/trans to smpl layer. This gives world coordinates. Then, apply camera extrinsics to obtain camera coordinates.

YHaooo-4508 commented 1 month ago

If you're using my json files, first, forward everything including pose/shape/trans to smpl layer. This gives world coordinates. Then, apply camera extrinsics to obtain camera coordinates.

I did it according to your requirements, but also got a wrong camera coordinates: [-514.82013,4447.08887,1112.78088] my smpl_forward() is as follows:

    def forward(self,
                pose_axis_angle,
                betas,
                global_orient,
                transl=None,
                return_verts=True):

        if global_orient is not None:
            full_pose = torch.cat([global_orient, pose_axis_angle], dim=1)
        else:
            full_pose = pose_axis_angle

        # Translate thetas to rotation matrics
        pose2rot = True
        # vertices: (B, N, 3), joints: (B, K, 3)
        vertices, joints, rot_mats, joints_from_verts_h36m = lbs(betas, full_pose, self.v_template,
                                                                 self.shapedirs, self.posedirs,
                                                                 self.J_regressor, self.J_regressor_h36m, self.parents,
                                                                 self.lbs_weights, pose2rot=pose2rot, dtype=self.dtype)

        if transl is not None:
            # apply translations
            joints += transl.unsqueeze(dim=1)
            vertices += transl.unsqueeze(dim=1)
            joints_from_verts_h36m += transl.unsqueeze(dim=1)
        else:
            vertices = vertices - joints_from_verts_h36m[:, self.root_idx_17, :].unsqueeze(1).detach()
            joints = joints - joints[:, self.root_idx_smpl, :].unsqueeze(1).detach()
            joints_from_verts_h36m = joints_from_verts_h36m - joints_from_verts_h36m[:, self.root_idx_17, :].unsqueeze(1).detach()

        output = ModelOutput(
            vertices=vertices, joints=joints, rot_mats=rot_mats, joints_from_verts=joints_from_verts_h36m)
        return output

The world2cam() is:

def world2cam(world_coord, R, T):
    cam_coord = np.dot(R, world_coord - T)
    return cam_coord

Is there anything wrong with these two steps?

mks0601 commented 1 month ago

your world2cam is wrong. cam_coord = np.dot(R, world_coord) + T is the right one.

YHaooo-4508 commented 1 month ago

your world2cam is wrong. cam_coord = np.dot(R, world_coord) + T is the right one.

Thank you for your correction, I did 2D visualization and it is indeed this issue. The 2D_vis_img is as follows:

image

It can be seen that the body points are relatively accurate, but the points on the face and feet are not very accurate. Can using SMPLX make the points on the face and feet more precise?

mks0601 commented 1 month ago

I don't think so. SMPL/SMPL-X fitting of Human3.6M is done by fitting respective models to 3D keypoints of Human3.6M, where the 3D keypoints of Human3.6M are from markers attached to the subjects. The markers sparsely provide body keypoints without feet and face.

YHaooo-4508 commented 1 month ago

Thank you for your patient answer, it has been of great help to me!