hengyuan-hu / bottom-up-attention-vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
GNU General Public License v3.0
750 stars 182 forks source link

fix vqa score computation #18

Closed mcogswell closed 6 years ago

mcogswell commented 6 years ago

It looks like the VQA score isn't quite what's specified as the VQA evaluation metric. It underestimates the actual VQA score a bit. This pull request fixes that. Note that it requires re-caching the labels by running tools/compute_softscore.py again. When I ran this on some models I've been testing most scores went up by about 0.9.

hengyuan-hu commented 6 years ago

you ignored this line "In order to be consistent with ‘human accuracies’, machine accuracies are averaged over all 10 choose 9 sets of human annotators. " If you actually average them over 10 choose 9 set you will see the hardcoded values should be correct.

mcogswell commented 6 years ago

Hmmm. Yup, that makes sense. Thanks for the code.