mks0601 / I2L-MeshNet_RELEASE

Official PyTorch implementation of "I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image", ECCV 2020
MIT License
724 stars 127 forks source link

There seems to be an error or offset 3d GT pose on the H36M test set #76

Open ShirleyMaxx opened 3 years ago

ShirleyMaxx commented 3 years ago

Hi, @mks0601

Thanks for your excellent work and code.

However, when I check the fitting results on the H36M test set (S9 & S11), it seems that there are several samples that have totally wrong or somewhat offset 3d joints joint_cam, and therefore have wrong mesh mesh_cam. E.g., could you please double-check this sample: data/Human36M/images/s_09_act_05_subact_02_ca_02/s_09_act_05_subact_02_ca_02_001201.jpg its joint_cam looks wrong below. image and its mesh_cam also is wrong below. image

So could you please help me double-check?

Also, there are some other samples, like data/Human36M/images/s_09_act_05_subact_02_ca_03/s_09_act_05_subact_02_ca_03_000501.jpg, the projected 2d joints seem to have offset, which may be related to the damaged scenes mentioned in "Learnable Triangulation of Human Pose" (ICCV 2019, oral)?

Thank you so much! Looking forward to your reply!

mks0601 commented 3 years ago

Hi, joint_cam means 3D joint coordinates provided in Human3.6M? or 3D joint coordinates from a 3D mesh?

mks0601 commented 3 years ago

Your visualization script or data loading script would be helpful

ShirleyMaxx commented 3 years ago

Thanks for your prompt reply.

joint_cam means 3D joint coordinates processed from https://github.com/mks0601/I2L-MeshNet_RELEASE/blob/5d495593fa99e3e44af0289964a7da7284fd9876/data/Human36M/Human36M.py#L93

and the visualization script can be inserted just before https://github.com/mks0601/I2L-MeshNet_RELEASE/blob/5d495593fa99e3e44af0289964a7da7284fd9876/data/Human36M/Human36M.py#L153

if "data/Human36M/images/s_09_act_05_subact_02_ca_02/s_09_act_05_subact_02_ca_02_001201.jpg" in img_path:
     transform = transforms.Compose([          
          transforms.ToTensor(),
          transforms.Normalize(
          mean=[0.485, 0.456, 0.406],
          std=[0.229, 0.224, 0.225]),
      ])
      img = load_img(img_path)
      assert self.data_split == 'test'
      img, img2bb_trans, bb2img_trans, rot, do_flip = augmentation(img, bbox, self.data_split)
      assert not do_flip
      img_copy = img.copy()
      img = transform(img.astype(np.uint8))
      vis_joints_3d(img.numpy()[np.newaxis, ...], joint_cam[np.newaxis, ...], None, file_name='vis_joint3dcam_gt_%d.jpg'%(image_id), draw_skeleton=True, nrow=1, ncol=2)

here, vis_joints_3d can be added in vis.py like below (need some other values or functions):

IMAGENET_MEAN, IMAGENET_STD = np.array([0.485, 0.456, 0.406]), np.array([0.229, 0.224, 0.225])
CONNECTIVITY_DICT = {
    "human36m": [(0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6)],
}

def denormalize_image(image):
    """Reverse to normalize_image() function"""
    return np.clip(255*(image * IMAGENET_STD + IMAGENET_MEAN), 0, 255)

def vis_joints_3d(batch_image, batch_joints, batch_joints_vis, file_name, draw_skeleton=False, batch_image_path=None, batch_trans=None, nrow=4, ncol=6, size=5, padding=2):
    '''
    batch_image: [batch_size, channel, height, width]
    batch_joints: [batch_size, num_joints, 3],
    batch_joints_vis: [batch_size, num_joints, 1],
    }
    '''
    assert batch_joints.shape[2] == 3, 'check batch_joints'
    plt.close('all')
    fig = plt.figure(figsize=(ncol*size, nrow*size))
    connectivity = CONNECTIVITY_DICT['human36m']

    for row in range(nrow):
        for col in range(ncol):
            batch_idx = col//2 + row*(ncol//2)
            if isinstance(batch_image, np.ndarray):
                img = batch_image[batch_idx]
                img = denormalize_image(np.transpose(img.copy(), (1,2,0))).astype(np.uint8)   # C*H*W -> H*W*C
            else:
                img = cv2.imread(batch_image_path[batch_idx], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION).copy().astype(np.uint8)
                # img = cv2.warpAffine(img, batch_trans[col].numpy(), (cfg.model.input_shape[1], cfg.model.input_shape[0]), flags=cv2.INTER_LINEAR)
                img = img[..., ::-1]

            joints = batch_joints[batch_idx]
            if col%2 == 0:  # draw image
                ax = fig.add_subplot(nrow, ncol, row * ncol + col + 1)
                ax.imshow(img)
            else:   # draw 3d joints
                ax = fig.add_subplot(nrow, ncol, row * ncol + col + 1, projection='3d')
                ax.scatter(joints[:, 0], joints[:, 1], joints[:, 2], s=5, c='black', edgecolors='black')
                if draw_skeleton:
                    for i, jt in enumerate(connectivity):
                        xs, ys, zs = [np.array([joints[jt[0], j], joints[jt[1], j]]) for j in range(3)]
                        ax.plot(xs, ys, zs, lw=5, ls='-', c='blue', solid_capstyle='round')

    save_path = osp.join(cfg.vis_joints3d_dir, file_name)
    plt.savefig(save_path, dpi=100, bbox_inches='tight', pad_inches=0.5)
    plt.show()

    plt.close('all')
    return

As for visualization of smpl_mesh_cam (https://github.com/mks0601/I2L-MeshNet_RELEASE/blob/5d495593fa99e3e44af0289964a7da7284fd9876/data/Human36M/Human36M.py#L301), you can use vis_joints_3d too, with replacing batch_joints to the mesh joints.

So, could you please help me check the joints and mesh again? e.g., for the above two samples.

Thank you so much!

mks0601 commented 3 years ago

I see. joint_cam is from Human3.6M dataset. The wrong one might be due to the failure of their motion capture process. Did you find many joint_cam suffer from this problem? Or very few suffer from this problem?

ShirleyMaxx commented 3 years ago

hmm I randomly plot half of the test set, it seems like that there are about 100 samples that have the wrong joint_cam, thus with the wrong mesh_cam. And most of them are in a continuous sequence. Some of them are totally wrong ( eg the action), but some of them seem to have just a little translation/offset.

However, when I process the raw h36m data before, it seems there is no failure of their motion capture process, except for the issue mentioned above (damaged scene (only in 2 actions of S9)), but this time, S11 also has wrong joint_cam.

So I think the problem may be in the annotation.zip you provided?

mks0601 commented 3 years ago

s_09_act_05_subact_02_ca_02_001201

This is data/Human36M/images/s_09_act_05_subact_02_ca_02/s_09_act_05_subact_02_ca_02_001201.jpg on my side. Could you check you take the correct image?

ShirleyMaxx commented 3 years ago

Ohhhhhh my god! It's a different image on my side! I should have checked my data now! Thank you so much!

mks0601 commented 3 years ago

Good for you. If you are going to download images again from here, make sure to check md5sum

ShirleyMaxx commented 3 years ago

okay! I'm downloading now! Thank you so much!

ZOUKaifeng commented 3 years ago

Hello,

Please allow me to raise a question that is not related to your question. I want to know if your data set is downloaded from google drive. If so, I want to know how you unzip. I always encountered the error "does not look like a tar archive". I would be grateful if I could get a reply.

Thank you!