salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.53k stars 195 forks source link

Only inference not finetune using VQA #88

Open SKBL5694 opened 2 years ago

SKBL5694 commented 2 years ago

Thanks for your great work and sorry for bothering you. Because I only have a single GPU with 6G memory(maybe I can borrow a 3090 for several days if I'm sure I can do the inference ). My question is can I only do inference rather than finetune on the task VQA. And how to do that? I'm quite new to this field. Till now, I download the vqa.pth and follow the readme download VQA_V2, but I found there are so many files on the website, and I only want to do some inference or test , so I think I don't need the files about training. I just download 'COCO testing images'(81,434 images) and 'Testing questions 2017 v2.0'(447,793 questions). But I found in VQA.yaml there are some keys I don't need such as 'train_file' , 'vg_root'. How can I run the code without these keys or these keys are necessary for running the code. By the way, I think the key 'answer_list' is an important key, but I can't find a template file for it. So I don't know how to make my own 'answer_list'. If the previous questions are not easy to answer because I don’t know enough about datasets or others, please tell me what materials I need to refer to first to supplement some relevant knowledge. I checked the previous related issues, but I don't quite understand how to solve my problem. Thank you so much, sorry for bothering you again