uclanlp / visualbert

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
528 stars 104 forks source link

COCO pre-training size #17

Closed e-bug closed 3 years ago

e-bug commented 4 years ago

Hi! Could you share the size of pre-training data? I saw that you extend the training set with part of the validation set.

liunian-harold-li commented 3 years ago

I think it was 100k images with 5 captions each. I used MSCOCO train+val data, which is conventional.