Training on Human3.6M provide Upside-down results

Maqingyang commented 5 years ago

I have trained the model on Human3.6M and UP-3D. However, the model produces resonable results of UP-3D, but have upside-down results for Human3.6M, of which the head keypoint is alway wrong.

I have checked my h36m_train.npz using the following code (h36m_train.npz is contained in class H36MDataset(), similar to FullDataset()). In the following code, I use batch = dataset[2] to directly get the img and annot batch, which already includes flip, rotation or adding noise. Notice that I have replaced self.to_lsp = list(range(14)) with self.to_h36m = list(range(13)) + [18], but this member seems not used in the train code, which may not be the cause(I'm not sure).


class Visualize(object):
    def __init__(self, options):
        self.options = options
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        # Renderer for visualization
        self.smpl = SMPL().to(self.device)
        self.renderer = Renderer(faces=self.smpl.faces.cpu().numpy())
        # LSP indices from full list of keypoints
        self.to_lsp = list(range(14))
        self.to_h36m = list(range(13)) + [18]

    def vis(self, input_batch):
        input_batch = {k: v.to(self.device) if isinstance(v, torch.Tensor) else v for k,v in input_batch.items()}
        rend_imgs = []

        img = input_batch['img_orig'].cpu().numpy().transpose(1,2,0) #(H, W, C)
        gt_keypoints_2d = input_batch['keypoints'].cpu().numpy()
        gt_keypoints_2d_ = gt_keypoints_2d[self.to_h36m]
        gt_pose = torch.unsqueeze(input_batch['pose'], 0)
        gt_betas = torch.unsqueeze(input_batch['betas'], 0)
        gt_vertices = self.smpl(gt_pose, gt_betas)

        vertices = gt_vertices[0].cpu().numpy()

        pred_keypoints_2d_ = gt_keypoints_2d_[:, :2]
        pred_camera = torch.unsqueeze(torch.Tensor([0,0,0]), 0)
        cam = pred_camera[0].cpu().numpy()

        rend_img = visualize_reconstruction(img, self.options.img_res\
            , gt_keypoints_2d_, vertices, pred_keypoints_2d_, cam, self.renderer)
        rend_imgs.append(torch.from_numpy(rend_img))

        rend_imgs = make_grid(rend_imgs, nrow=1)
        plt.imshow(rend_imgs)

        plt.savefig('others/h36m_vis.png')

with open('/hpn/logs/paper_step1/config.json') as f:
    json_args = json.load(f)
    json_args = namedtuple('json_args', json_args.keys())(**json_args)
options = json_args
dataset = H36MDataset(options)

batch = dataset[2] #already includes flip, rotation or adding noise

visulize = Visualize(options)
visulize.vis(batch)

And it seems that gt 2D pose is correct, and data preprocess(like flip and rotation) is also correct.

So, I doubt that problem lies in the training code. Could you help me to locate where the problem may come from? Thank you very much!

Any other detail on ask.

Maqingyang commented 5 years ago

BTW, I haven't figure out why the 3D joints have size of (B, 38, 3) in the model.smpl.py, and what's the difference between J_regressor and J_regressor_extra .

        """
        This method is used to get the joint locations from the SMPL mesh
        Input:
            vertices: size = (B, 6890, 3)
        Output:
            3D joints: size = (B, 38, 3)
        """
        joints = torch.einsum('bik,ji->bjk', [vertices, self.J_regressor])
        joints_extra = torch.einsum('bik,ji->bjk', [vertices, self.J_regressor_extra])
        joints = torch.cat((joints, joints_extra), dim=1)
        joints = joints[:, cfg.JOINTS_IDX]
        return joints

Maqingyang commented 5 years ago

More detail

I did't use

# in config.py
# Indices to get the 14 LSP joints from the ground truth joints
J24_TO_J14 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 18]

when constructing H36MDataset(). But I find it is used in eval.py. Do you use it in your origin code?

nkolot commented 5 years ago

First, you might have to flip the SMPL pose as described here. This is probably why your are getting flipped shapes. The network tries to both match the output shape with the target shape and align it with the 2D keypoints, that's why you are getting these "spiky" shapes as output.

The size of the joints is 38 because different datasets have different joint definitions, so we take a superset of all those joints and apply the losses only on the relevant joints.

J24_TO_J14 gets the 14 lsp joints from the 24 SMPL joints and is used to evaluate MPJPE on H36M. During training we apply the joint loss on 17 joints, but we evaluate only on 14.

Maqingyang commented 5 years ago

Thank you! It works now!

nkolot / GraphCMR

Training on Human3.6M provide Upside-down results #16

More detail