Closed Ashwin-Ramesh2607 closed 4 years ago
A given image is contained 5 times in the training set, each example with a different caption. See tokenize_captions.py for details.
@krasserm Got it, understood what happens during training. However, for evaluation what do we do? Do you take the highest score among all 5 captions and evaluate based on that?
Hey, my doubt is regarding how the COCO dataset is used while training. The COCO homepage states that every image has atmost 5 captions. Now when you're training your repo, which caption are you using? How do we handle the fact that we have multiple captions per image?