microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.08k stars 2.43k forks source link

Fine tuning kosmos-2 #1562

Closed FarzanRahmani closed 1 week ago

FarzanRahmani commented 1 month ago

Hi @pengzhiliang. I want to finetune kosmos-2 on a VQA task that answer is a single word (like a multi-class classification task) and I call this single word label. I only have question answer pairs but not bounding boxes. I was wondering that I should use <grounding> or not. I mean should I use <grounding> Question: Are there any <phrase>cats</phrase> in the image? Answer: label or Question: Are there any <phrase>cats</phrase> in the image? Answer: label. I am using Kosmos2ForConditionalGeneration. and another question: is it rational to use Kosmos2ForConditionalGeneration for fine tuning or not?

pengzhiliang commented 1 week ago

Thank you for your patience. @FarzanRahmani If your downstream task does not involve bounding boxes, there's no need to use . You can use it like this: Question: {question} Answer: {answer} or Question: {question} Answer the question using a single word or phrase. Answer: {answer}

FarzanRahmani commented 1 week ago

Thanks for your attention and answer. @pengzhiliang