ryankiros / visual-semantic-embedding

Implementation of the image-sentence embedding method described in "Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"
Other
426 stars 126 forks source link

COCO's val/test sets captions incomplete? #8

Open armandvilalta opened 7 years ago

armandvilalta commented 7 years ago

I realized that in the validation / test provided for COCO there are 5000 images and 5000 captions where for 5000 images should be 25000 captions. Actually, the last 4000 images do not have corresponding caption. So, if we only use the provided captions we are evaluating over a 1000 images subset. In readme file says:

Flickr8K comes with a pre-defined train/dev/test split, while for Flickr30K and MS COCO we use the splits produced by Andrej Karpathy.

While Karpathy's paper indicates:

For MSCOCO we use 5,000 images for both validation and testing

Is the original test actually over 1000 images or the caption list provided is incomplete? Thanks, Armand