mkocabas / EpipolarPose

Self-Supervised Learning of 3D Human Pose using Multi-view Geometry (CVPR2019)
Other
594 stars 97 forks source link

Is it fair to use GT intrinsic camera parameters in post-processing? #15

Closed John6333 closed 5 years ago

John6333 commented 5 years ago

https://github.com/mkocabas/EpipolarPose/blob/9b1d316f1dc00115abe5b56ba664055a70a3a135/lib/utils/cameras.py#L64 https://github.com/mkocabas/EpipolarPose/blob/9b1d316f1dc00115abe5b56ba664055a70a3a135/lib/core/inference.py#L115 It seems that you have used the GT intrinsic parameters and GT root depth to unproject 2D to 3D. Is it fair to claim your method is SOTA when your method uses GT camera parameters?

mkocabas commented 5 years ago

Dear @John6333,

We use the Integral Pose as the baseline model for our experiments. The evaluation function is from their code. So, we can say that our results throughout the paper are fair against the baseline we use.

Regarding the fairness of overall results, unfortunately there is not a standard, official way of evaluating the performance on Human3.6M dataset. Currently, there are 2 main protocols P1 and P2 measured using the subject-9 and subject-11. However, the way people calculate these metrics differs a lot. Using GT camera parameters or not is one of them. Most of the time, the choice of using these parameters depends on the method itself. If a method is capable of predicting the scale (see Martinez et al. and this issue opened by me) then you don't need to use GT parameters. Otherwise, as in the case of Integral Pose, you would need it.

As another example to inconsistent evaluation, most of the papers have different sampling frequencies from the test set. Some uses all, some samples every 10th or 64th frames.

mkocabas commented 5 years ago

Closing this issue, feel free to reopen if you need.