Open bearpaw opened 8 years ago
Moreover. I found that the released results on validation set is slightly worse than the results on the TEST set:
method | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean |
---|---|---|---|---|---|---|---|---|
valid-example | 95.80 | 94.21 | 87.40 | 82.75 | 86.03 | 81.83 | 78.32 | 86.76 |
valid-ours | 95.94 | 94.68 | 88.53 | 83.38 | 87.48 | 83.09 | 79.05 | 87.56 |
Is the released model different against the model in your arxiv paper? Thanks in advance!
valid-ours.h5 is currently a bit outdated as well as the released model. With that said, I generally found validation performance to be around 2% worse than test set performance.
valid-example.h5 is the file that gets written to when running the validation demo code, so if someone runs with a different model it was just a way to distinguish it from our baseline performance. I just put an arbitrary set of predictions as filler there so if the evaluation code was run it would show different curves. I didn't think much about it at the time and in hindsight I realize that might be a bit confusing, sorry about that!
@anewell Can you please clarify which model architecture was used to generate valid-ours.h5
? Was it the 8-stack hourglass used to produce Table 2 in your paper? I wish to use these results as a reference point during development.
Hi, valid-ours.h5 is from an old version of the model, and does not correspond to the output of the 8-stack network. Sorry for any confusion that might cause. You can download the 8-stack model though and run the evaluation to see how it does.
Thanks for your reply. I've found a small bug in the evaluation code (see https://github.com/anewell/pose-hg-demo/pull/14), but other than that I am able to generate validation set predictions just fine.
Would you consider accepting a PR to add the validation set predictions into the preds/
folder (eg as valid-hg8.h5
or perhaps replacing valid-ours.h5
)? It would be very helpful to have an "official" validation prediction set for your highly influential model, and would clear up confusion for people who wrongly believe that valid-ours.h5
represents your peak performance.
Hi, I notice that
valid-example.h5
is better thanvalid-ours.h5
. Would you please tell me the difference between these two results? Thank you.