una-dinosauria / 3d-pose-baseline

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.
MIT License
1.42k stars 356 forks source link

Question about baseline comparisons in Tables 1 and 3 #59

Closed macaodha closed 6 years ago

macaodha commented 6 years ago

I have a quick question regarding the comparison to Moreno-Noguer [27].

If I understand correctly, your Protocol #2, is all actions and viewpoints, (S1, S5, S6, S7, S8) for training, and (S9, S11) for test. You compare to Protocol #3 from [27] in Table 3, but in their paper is this not all actions and viewpoints, (S1, S5, S6, S7, S8 and S9) for training and 1/64 frontal view S11 for test.

Also, I was wondering where your got the GT/CPM result for [27] in your Table 1. I couldn't seem to find it in theirs.

Thanks for your help!

una-dinosauria commented 6 years ago

Hi @macaodha

If I understand correctly, your Protocol #2, is all actions and viewpoints, (S1, S5, S6, S7, S8) for training, and (S9, S11) for test. You compare to Protocol #3 from [27] in Table 3, but in their paper is this not all actions and viewpoints, (S1, S5, S6, S7, S8 and S9) for training and 1/64 frontal view S11 for test.

There is large variety in the protocols used in previous work, either in

  1. the number of joints used (accounted for in our paper)
  2. the amount of training data (eg Moreno-Noguer CVPR'17, Chen and Ramanan CVPR'17)
  3. subsampling at test time (eg Moreno-Noguer CVPR'17, Bogo et al ECCV'16) -- presumably because those approaches take a long long time to run
  4. scale knowledge at test time (arxiv v1 of Pavlakos et al, CVPR'17 -- this was corrected in arxiv v2 / CVPR'17 camera-ready)

As you might imagine, it is a bit hard to set up all these different experiments and compare against the particular tweaks of other researchers. We instead decided to go for the most straightforward protocol, using both 14 and 17 joints, less data than some previous work, not subsampling, and avoiding any scale knowledge under protocol 1. This is a very bare-bones setup that puts us at a disadvantage, but turned out perform quite well nonetheless. I imagine that you re-ran the experiments under different setups our advantage would increase.

Also, I was wondering where your got the GT/CPM result for [27] in your Table 1. I couldn't seem to find it in theirs.

Good catch! We got it from the arxiv version: https://arxiv.org/pdf/1611.09010.pdf. We forgot to update it in the final version.

macaodha commented 6 years ago

Yes, it seems like there are a lot of different ways to compare the results. Thanks for confirming.

macaodha commented 6 years ago

Can I confirm one more thing, Im trying to understand the SH experiments.

In your Table 1 am I correct in saying that GT/SH means: "train on ground truth 2D, test on SH 2D"?

As opposed to Table 3, where Ours (SH detections) (MA) means: "train on SH 2D and test on SH 2D".

una-dinosauria commented 6 years ago

In your Table 1 am I correct in saying that GT/SH means: "train on ground truth 2D, test on SH 2D"

Yes, the caption of Table 1 says: "(Bottom) Training on ground truth and testing on the output of a 2d detector." What is the source of confusion here?

As opposed to Table 3, where Ours (SH detections) (MA) means: "train on SH 2D and test on SH 2D".

Correct. This line can be directly reproduced from the repo. I hope that is clear as well.

macaodha commented 6 years ago

Makes sense, thanks!