microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.16k stars 2.55k forks source link

Fine tuning kosmos-2 #1562

Closed FarzanRahmani closed 4 months ago

FarzanRahmani commented 5 months ago

Hi @pengzhiliang. I want to finetune kosmos-2 on a VQA task that answer is a single word (like a multi-class classification task) and I call this single word label. I only have question answer pairs but not bounding boxes. I was wondering that I should use <grounding> or not. I mean should I use <grounding> Question: Are there any <phrase>cats</phrase> in the image? Answer: label or Question: Are there any <phrase>cats</phrase> in the image? Answer: label. I am using Kosmos2ForConditionalGeneration. and another question: is it rational to use Kosmos2ForConditionalGeneration for fine tuning or not?

pengzhiliang commented 4 months ago

Thank you for your patience. @FarzanRahmani If your downstream task does not involve bounding boxes, there's no need to use . You can use it like this: Question: {question} Answer: {answer} or Question: {question} Answer the question using a single word or phrase. Answer: {answer}

FarzanRahmani commented 4 months ago

Thanks for your attention and answer. @pengzhiliang