Open jamiechoi1995 opened 6 years ago
The split in this repo is different from the split on https://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip. (AKA Karpathy's split in many papers. see also https://github.com/kelvinxu/arctic-captions/tree/master/splits)
In this repo, images of training and val set are first combined together, (see: https://github.com/karpathy/neuraltalk2/blob/master/coco/coco_preprocess.ipynb)
then they were shuffled and split into training, val, test set, (see: https://github.com/karpathy/neuraltalk2/blob/bd8c9d879f957e1218a8f9e1f9b663ac70375866/prepro.py#L159)
which results that in test set, there are lots of images originally comes from the training set.
But for the test split on https://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip.
only contains images originally comes from the val set.
Just note this here to avoid confusion when reproducing results of papers based on Karpathy's split.
The split in this repo is different from the split on https://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip. (AKA Karpathy's split in many papers. see also https://github.com/kelvinxu/arctic-captions/tree/master/splits)
In this repo, images of training and val set are first combined together, (see: https://github.com/karpathy/neuraltalk2/blob/master/coco/coco_preprocess.ipynb)
then they were shuffled and split into training, val, test set, (see: https://github.com/karpathy/neuraltalk2/blob/bd8c9d879f957e1218a8f9e1f9b663ac70375866/prepro.py#L159)
which results that in test set, there are lots of images originally comes from the training set.
But for the test split on https://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip.
only contains images originally comes from the val set.
Just note this here to avoid confusion when reproducing results of papers based on Karpathy's split.