salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.65k stars 943 forks source link

BLIP2 VQA finetune #204

Open evelinehong opened 1 year ago

evelinehong commented 1 year ago

Hi we are working on finetuning VQA with BLIP2. Any instructions on how to modify the codes? When will the finetuning codes be released?

xliucs commented 1 year ago

Same question. I am wondering if there is any updates on this. Thanks!

fmdmm commented 1 year ago

The question is similar: https://github.com/salesforce/LAVIS/issues/125

qwqwq1445 commented 8 months ago

Excuse me, I am also working on finetuning VQA on BLIP2. In the paper, I find that the Prompt used for VQA is "Question: {} Answer:". I would like to ask if my understanding is correct: when training, we don't utilize the prompt and only use the original question input; when testing, we utilize the prompt to reformat the question input to get a better performance. I will appreciate it if you could kindly help. Thanks.

Hurwitzzz commented 5 months ago

Excuse me, I am also working on finetuning VQA on BLIP2. In the paper, I find that the Prompt used for VQA is "Question: {} Answer:". I would like to ask if my understanding is correct: when training, we don't utilize the prompt and only use the original question input; when testing, we utilize the prompt to reformat the question input to get a better performance. I will appreciate it if you could kindly help. Thanks.

Hi, I have the same question. According to the forword() function in blip2_t5.py, it seems like prompts are not used during training. But I'm wondering that shouldn't we use the same format during training and evaluating? Did you figure it out? Thanks!