mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
https://grounding-anything.com
710 stars 36 forks source link

Training on New Data #57

Open ajb8866 opened 1 month ago

ajb8866 commented 1 month ago

Hello, I wondered if there was a way to train the model and new data. I am sorry if I missed the documentation somewhere.

hanoonaR commented 1 month ago

Hi @ajb8866,

Thank you for your interest in our work. Yes you can train or finetune the model on a new data. You will need to see which data type in GLaMM your new data is the most closest to based on the application (image-level captioning, Region level captioning, segmentation of Grounded conversation generation). You will find the details of the datasets in the documentation. Then you can modify the data to the required format of the closest data type - especially how to format the instruction tuning format. If you have a specific question on how to adapt something - please feel to reach out. Thank you.

ajb8866 commented 1 month ago

So I have some VQA data in a folder (with images) with a JSON in the llava prompt format. Do I just add it to the list and train like usual?