Scale of the person - Githubissues

umariqb commented 5 years ago

Great work Muhammed!! I just wanted to know about the kind of ground-truth information that is being used during inference. I can see that the ground-truth depth of the root keypoint is used during back-projection, but what about the scale? In the paper, you have mentioned that the "model uses normalized poses as ground truth" for training, how do you recover the scale during inference? Thanks!

mkocabas commented 5 years ago

Hi @iqbalu,

Thanks for the appreciation!

We have two different training settings which are training with known and unknown camera extrinsic parameters i.e. rotation and translation. In the known parameters case, we are using the ground truth depth information as you mentioned. Indeed, it is identical to what Integral Pose authors did. So, scale is recovered using the depth + camera parameters in this setting.

In the unknown parameters case, translation isn't available since we are leveraging standard essential matrix estimation to create pseudo ground truth labels. And we are not doing anything special to recover scale during inference. But in order to compare the performance, we use Normalized MPJPE identical to [1]. Similar to their strategy, we calculate the scale between predicted_pose and GT_pose using procrustes, then compute MPJPE between scaled predicted_pose and GT_pose.

By the way, Integral Pose uses ground truth labels normalized in [-0.5, 0.5] range for all dimensions. This is why we mention "model uses normalized poses as ground truth".

I hope the explanation is clear enough.

Best!

[1] Rhodin et al., Learning Monocular 3D Human Pose Estimation from Multi-view Images, CVPR 2018.

umariqb commented 5 years ago

Hi @mkocabas,

Thanks a lot for the detailed answers. Yes, everything is clear now, I misunderstood the term "scale normalization" in the context of Integral Pose. You may close the issue now.

Thanks, Umar

mkocabas commented 5 years ago

Great!

mkocabas / EpipolarPose

Scale of the person #3