salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.73k stars 955 forks source link

How to use your own dataset to train and fine-tune the VQA task of BLIP2-flant5xl #152

Open xcxhy opened 1 year ago

xcxhy commented 1 year ago

Hi, thank you very much for open source. I want to use my own Image and caption, and QA data to fine-tune the BLIP2 data. Should my process be to prepare the same data set for okvaq, and then run the /run_scripts/blip2/eval/eval_okvqa_zeroshot_flant5xl.sh file? Then should I copy evaluate.py into the run_scripts/blip2/eval/ path? Or is my approach wrong?

LiJunnan1992 commented 1 year ago

Hi, eval_okvqa_zeroshot_flant5xl.sh provides the script for evaluation. You can refer to train_caption_coco.sh for fine-tuning on image captioning. We are still working on providing support for VQA fine-tuning.

Thanks.

chenyd0763 commented 1 year ago

Thank you very much for your comments. However, it appears that the codes are designed for fine-tuning on the COCO dataset rather than a custom dataset. I was wondering if it would be possible to make modifications to the code in order to fine-tune the model on our custom dataset by registering it in the 'builders' directory?

dxli94 commented 1 year ago

@chenyd0763, can you take a look at our tutorial on how to add new datasets? https://opensource.salesforce.com/LAVIS//latest/tutorial.datasets.html

dongrixinyu commented 1 year ago

Hi, eval_okvqa_zeroshot_flant5xl.sh provides the script for evaluation. You can refer to train_caption_coco.sh for fine-tuning on image captioning. We are still working on providing support for VQA fine-tuning.

Thanks.

looking forward to the training and finetuning code

xcxhy commented 1 year ago

Thank to your response, I will try it later.

@.***

 

------------------ 原始邮件 ------------------ 发件人: "salesforce/LAVIS" @.>; 发送时间: 2023年3月14日(星期二) 下午2:16 @.>; @.**@.>; 主题: Re: [salesforce/LAVIS] How to use your own dataset to train and fine-tune the VQA task of BLIP2-flant5xl (Issue #152)

Hi, eval_okvqa_zeroshot_flant5xl.sh provides the script for evaluation. You can refer to train_caption_coco.sh for fine-tuning on image captioning. We are still working on providing support for VQA fine-tuning.

Thanks.

looking forward to the training and finetuning code

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

xliucs commented 1 year ago

Also looking forward to the training and fine-tuning code.

mayada24 commented 1 year ago

Is fine-tuning code available?

dreamlychina commented 1 year ago

looking forward to the training and finetuning code

matthewdm0816 commented 1 year ago

looking forward to the training and finetuning code

AbhinavGopal commented 1 year ago

Looking forward to the finetuning code for VQA, think it could lead to some very interesting applications :)

arcb01 commented 1 year ago

looking forward to the fine-tuning code for VQA as well.

edchengg commented 1 year ago

looking forward to the fine-tuning code for VQA +1

robertjoellewis commented 1 year ago

Also looking forward to the fine-tuning support. Is it here yet? :)

essamsleiman commented 1 year ago

Also looking forward to the fine-tuning support!

qwqwq1445 commented 1 year ago

Also looking forward to the fine-tuning code on VQA!

nkjulia commented 1 year ago

VQA的finetune还没出来吗

NWalker4483 commented 1 year ago

Also looking forward to the fine-tuning code for VQA!

weizhouc commented 1 year ago

Looking forward to fine-tuning for VQA!

lookevink commented 1 year ago

looking forward to fine tuning for vqa. at this point just captioning and running llm of choice, but obviously will be awesome if vqa can be fine tuned directly

wendyunji commented 1 year ago

Also looking forward to the fine-tuning code for VQA :)

hannahgym commented 10 months ago

does anybody know if code for BLIP2 VQA finetuning is available? /thanks

18445864529 commented 10 months ago

I know, no, obviously.

dino-chiio commented 10 months ago

Hi everyone. I have implemented the BLIP-VQA-BASE model for the VQA task here. I hope this implementation can help you and this implementation will receive some advice.

WildLight commented 5 months ago

does anybody know if code for BLIP2 VQA finetuning is available? /thanks

hi, have you implemented fine-tune blip2 on the vqa task?