I realized that in the validation / test provided for COCO there are 5000 images and 5000 captions where for 5000 images should be 25000 captions. Actually, the last 4000 images do not have corresponding caption. So, if we only use the provided captions we are evaluating over a 1000 images subset.
In readme file says:
Flickr8K comes with a pre-defined train/dev/test split, while for Flickr30K and MS COCO we use the splits produced by Andrej Karpathy.
While Karpathy's paper indicates:
For MSCOCO we use 5,000 images for both validation and testing
Is the original test actually over 1000 images or the caption list provided is incomplete?
Thanks,
Armand
I realized that in the validation / test provided for COCO there are 5000 images and 5000 captions where for 5000 images should be 25000 captions. Actually, the last 4000 images do not have corresponding caption. So, if we only use the provided captions we are evaluating over a 1000 images subset. In readme file says:
While Karpathy's paper indicates:
Is the original test actually over 1000 images or the caption list provided is incomplete? Thanks, Armand