jiasenlu / AdaptiveAttention

Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
https://arxiv.org/abs/1612.01887
Other
334 stars 74 forks source link

Performance on Flickr30k Dataset #25

Open atg93 opened 4 years ago

atg93 commented 4 years ago

Hi, I used your pretrained model for Flickr30k Dataset. However, the performance is very bad than the performance on paper. Bleu_1: 0.206 Bleu_2: 0.122 Bleu_3: 0.077 Bleu_4: 0.051 computing METEOR score... METEOR: 0.108 computing Rouge score... ROUGE_L: 0.278 computing CIDEr score... CIDEr: 0.377 computing SPICE score... Parsing reference captions Parsing test captions SPICE evaluation took: 3.287 s SPICE: 0.161 Could you please check the pretrained model ? By the way, after creating h5 and json file for flickr30k dataset, parameter number did not match with the pretrained model. Thus, I had to take word_count_threshold 4.

Akashtyagi commented 4 years ago

Were you able to improve the performance anymore ? The performance result you mentioned looks not worth digging into it.