kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
http://kdexd.xyz/virtex
MIT License
557 stars 61 forks source link

Reproduction of SPICE results (eval_captioning.py) #14

Closed Eladhi closed 4 years ago

Eladhi commented 4 years ago

Hi, nice work - thanks for sharing the code.

I'm trying to reproduce CIDEr & SPICE results, as appear in figure 4 in your paper. I simply load the pre-trained models (specifically, those that correspond to H=1024 & H=2048 in width-ablation) and run eval_captioning.py, after building the vocabulary. The values I get are much lower than those in fig. 4, which seems like some inconsistency in the pre/post-processing. Should I expect the same values in this experiment? If so, is there any change I should perform?

kdexd commented 4 years ago

Hi @Eladhi! Thanks for trying out the code. I can confirm this issue: this bug was introduced in recent commit 21b317b where I renamed some parameters. I will look into this issue over the coming weekend and let you know.

kdexd commented 4 years ago

Fixed in 3b6d628 — I can reproduce the results now. Feel free to re-open this if you face any problems @Eladhi!