salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.57k stars 199 forks source link

About VQA annotations #110

Open simplelifetime opened 2 years ago

simplelifetime commented 2 years ago

Hello, thanks for your excellent work. I'm reproducing the results in the repo. I found that the vqa_train annotation files differ from the original VQAv2 annotations. There are some answers in vqa_train that I can't find in both VQAv2 or VQAv1 annotations. Are there any data augmentation or am I missing something? An example: what is written on the bus ['buddy holly', 'buddy holly and crickets'] The two answers don't either exist in answer pools nor in the annotation files.

LiJunnan1992 commented 2 years ago

Hi, we use the official VQAv2 annotations. Note that QA pairs from visual genome are also used during fine-tuning.