Closed jayleicn closed 3 years ago
Hi,
Here are the dataset json files (I've also updated the readme). https://storage.googleapis.com/sfr-pcl-data-research/ALBEF/json_pretrain.zip For each json file, you need to change the image paths to adapt to your own directory.
Thanks!
Thanks! This is very helpful. A related question here, I noticed the released VG captions only contain 769K captions (also see Table 8 in this work, screenshot 1 below), while UNITER has 5M VG captions (see Table 1 in UNITER, screenshot 2 below). Is there any sort of filtering used in obtaining the 769K captions from the 5M captions? Could you elaborate this process?
Yes there are 4 filters for VG:
Thanks for the prompt reply, I am closing this issue.
Yes there are 4 filters for VG:
- remove samples that occur in the evaluation set of COCO or RefCOCO+
- remove duplicate sentences for each image
- remove sentences whose corresponding region has an area size that is <20% of the image's area size
- remove sentences that have fewer than 4 words
Hi @LiJunnan1992, I have some follow-up questions on filters 3 and 4: (1) Do you have some intuitions on using them? (2) Did you conduct experiments comparing the difference with and without them?
Thanks!
Because we perform random image crop during training, filter (3) will decrease the chance that the text does not describe the image. Filter (4) aims to remove less informative text.
Our experiments do not show any significant difference, so we keep them removed because it decreases training time.
This makes sense. Thanks!
Hi @LiJunnan1992,
Congrats on your great work, and thanks for releasing the code!! To help reproduce the pretraining experiments, could you release the dataset json files for the pretraining datasets as well? Thanks!
Best, Jie