salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.57k stars 938 forks source link

Finetuning VQA on BLIP2 #409

Open zhl98 opened 1 year ago

zhl98 commented 1 year ago

Hi, Can you add the VQA fine-tuning function of BLIP2? In the paper, when you fine-tune the VQA task, you will fine-tune the image encoder. When I use the freeze_vit: False command. But I encountered issues with loss and model parameters becoming nan and inf. Initial weight: image

Gradient of the first step of the model: image

After update the weight of model : image

Can you help me analyze the reason? Thank you very much.

zhl98 commented 1 year ago

The lr is 1e-5

simplelifetime commented 12 months ago

Hi,

Same problem, have you fixed?

zhl98 commented 12 months ago

update the vit from fp16 to fp32

BrianG13 commented 10 months ago

@zhl98 Did you manage to fine-tune on VQA? Can you share code?

qwqwq1445 commented 8 months ago

Excuse me, I am also working on finetuning VQA on BLIP2. In the paper, I find that the Prompt used for VQA is "Question: {} Answer:". I would like to ask if my understanding is correct: when training, we don't utilize the prompt and only use the original question input; when testing, we utilize the prompt to reformat the question input to get a better performance. I will appreciate it if you could kindly help. Thanks.

shams2023 commented 5 months ago

Excuse me, I am also working on finetuning VQA on BLIP2. In the paper, I find that the Prompt used for VQA is "Question: {} Answer:". I would like to ask if my understanding is correct: when training, we don't utilize the prompt and only use the original question input; when testing, we utilize the prompt to reformat the question input to get a better performance. I will appreciate it if you could kindly help. Thanks.

Can you share code?

WildLight commented 4 months ago

hi, have you implemented fine-tune blip2 on the vqa task?

salvatoregrimaUni commented 1 month ago

hey, are there any news about finetuning BLIP-2 on VQA task??