salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.9k stars 971 forks source link

Fine-tuning BLIP2 on Custom Dataset for Captioning or Classification Tasks #281

Open GMBarra opened 1 year ago

GMBarra commented 1 year ago

Hi, I am interested in fine-tuning the BLIP2 model on a custom dataset for captioning or classification tasks. My custom dataset is formatted similarly to the COCO dataset, consisting of a dictionary with image paths and corresponding image captions. I have two questions regarding this:

  1. Is it necessary to run pretrain 1 and 2 before utilizing my custom dataset for fine-tuning?
  2. What parts of the code should I modify to incorporate my custom dataset? Is it possible to make minimal changes to the caption_coco_ft.yaml file to redirect the path to my custom dataset, or do I need to create an entirely new dataset format?

Any help or guidance you could provide would be greatly appreciated.

LiJunnan1992 commented 1 year ago

It is suggested to initialize from stage-2 pretrained model for finetuning on captioning dataset. You can create a new dataset builder following the instructions, and then modify caption_coco_ft.yaml to use your own dataset.