salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.53k stars 195 forks source link

The number of captions of VG #20

Closed zdou0830 closed 2 years ago

zdou0830 commented 2 years ago

Hi, thanks for the great work!

I wonder why Table 8 says the number of captions of VG is 769K while in previous papers (e.g. ViLT, UNITER) it is ~5M?

LiJunnan1992 commented 2 years ago

Hi, I have done some filtering on the VG captions:

  1. remove samples that occur in the evaluation set of COCO or RefCOCO+
  2. remove duplicate sentences for each image
  3. remove sentences whose corresponding region has an area size that is <20% of the image's area size
  4. remove sentences that have fewer than 4 words

Please see this issue for more discussions: https://github.com/salesforce/ALBEF/issues/2

Thanks!

zdou0830 commented 2 years ago

Thanks!