What is the different between valid-ours.h5 and valid-example.h5

princeton-vl / pose-hg-demo

Code to test and use the model from "Stacked Hourglass Networks for Human Pose Estimation"

BSD 3-Clause "New" or "Revised" License

316 stars 86 forks source link

What is the different between valid-ours.h5 and valid-example.h5 #1

Open bearpaw opened 8 years ago

bearpaw commented 8 years ago

Hi, I notice that valid-example.h5 is better than valid-ours.h5. Would you please tell me the difference between these two results? Thank you.

bearpaw commented 8 years ago

Moreover. I found that the released results on validation set is slightly worse than the results on the TEST set:

method	Head	Shoulder	Elbow	Wrist	Hip	Knee	Ankle	Mean
valid-example	95.80	94.21	87.40	82.75	86.03	81.83	78.32	86.76
valid-ours	95.94	94.68	88.53	83.38	87.48	83.09	79.05	87.56

Is the released model different against the model in your arxiv paper? Thanks in advance!

anewell commented 8 years ago

valid-ours.h5 is currently a bit outdated as well as the released model. With that said, I generally found validation performance to be around 2% worse than test set performance.

valid-example.h5 is the file that gets written to when running the validation demo code, so if someone runs with a different model it was just a way to distinguish it from our baseline performance. I just put an arbitrary set of predictions as filler there so if the evaluation code was run it would show different curves. I didn't think much about it at the time and in hindsight I realize that might be a bit confusing, sorry about that!

anibali commented 7 years ago

@anewell Can you please clarify which model architecture was used to generate valid-ours.h5? Was it the 8-stack hourglass used to produce Table 2 in your paper? I wish to use these results as a reference point during development.

anewell commented 7 years ago

Hi, valid-ours.h5 is from an old version of the model, and does not correspond to the output of the 8-stack network. Sorry for any confusion that might cause. You can download the 8-stack model though and run the evaluation to see how it does.

anibali commented 7 years ago

Thanks for your reply. I've found a small bug in the evaluation code (see https://github.com/anewell/pose-hg-demo/pull/14), but other than that I am able to generate validation set predictions just fine.

Would you consider accepting a PR to add the validation set predictions into the preds/ folder (eg as valid-hg8.h5 or perhaps replacing valid-ours.h5)? It would be very helpful to have an "official" validation prediction set for your highly influential model, and would clear up confusion for people who wrongly believe that valid-ours.h5 represents your peak performance.