Fine-tuning BLIP2 on Custom Dataset for Captioning or Classification Tasks

Hi, I am interested in fine-tuning the BLIP2 model on a custom dataset for captioning or classification tasks. My custom dataset is formatted similarly to the COCO dataset, consisting of a dictionary with image paths and corresponding image captions. I have two questions regarding this:

Is it necessary to run pretrain 1 and 2 before utilizing my custom dataset for fine-tuning?
What parts of the code should I modify to incorporate my custom dataset? Is it possible to make minimal changes to the caption_coco_ft.yaml file to redirect the path to my custom dataset, or do I need to create an entirely new dataset format?

Any help or guidance you could provide would be greatly appreciated.

salesforce / LAVIS

Fine-tuning BLIP2 on Custom Dataset for Captioning or Classification Tasks #281