Hi @pengzhiliang. I want to finetune kosmos-2 on a VQA task that answer is a single word (like a multi-class classification task) and I call this single word label. I only have question answer pairs but not bounding boxes. I was wondering that I should use <grounding> or not. I mean should I use <grounding> Question: Are there any <phrase>cats</phrase> in the image? Answer: label or Question: Are there any <phrase>cats</phrase> in the image? Answer: label. I am using Kosmos2ForConditionalGeneration.
and another question: is it rational to use Kosmos2ForConditionalGeneration for fine tuning or not?
Thank you for your patience. @FarzanRahmani
If your downstream task does not involve bounding boxes, there's no need to use .
You can use it like this:
Question: {question} Answer: {answer}
or
Question: {question} Answer the question using a single word or phrase. Answer: {answer}
Hi @pengzhiliang. I want to finetune kosmos-2 on a VQA task that answer is a single word (like a multi-class classification task) and I call this single word label. I only have question answer pairs but not bounding boxes. I was wondering that I should use
<grounding>
or not. I mean should I use<grounding> Question: Are there any <phrase>cats</phrase> in the image? Answer: label
orQuestion: Are there any <phrase>cats</phrase> in the image? Answer: label
. I am using Kosmos2ForConditionalGeneration. and another question: is it rational to use Kosmos2ForConditionalGeneration for fine tuning or not?