yangli18 / VLTVG

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
91 stars 8 forks source link

Split of Flickr30K dataset #7

Closed tiger990111 closed 2 years ago

tiger990111 commented 2 years ago

Dear, I find that the split file flickr_train.pth has 427193 datas, which is supposed to has 29,783 training datas. So is it a mistake in the data.tar? or how can we get the correct split files. Thanks!

yangli18 commented 2 years ago

@tiger990111 Hi. The training set's 427,193 image-text pairs contain many duplicate images, which means that one image could have more than one associated text. 29,779 training images were actually used. You can recheck it.

tiger990111 commented 2 years ago

Understood, so there are 29,779 images in training set, but the size of traing set is 427,193.

yangli18 commented 2 years ago

Yes.