Open joshmyersdean opened 7 months ago
I am also interested in this issue, which seems to be unresolved: how can I include the bounding box in the prompt and have it generate a response? I look forward an answer.
@yutojubako @joshmyersdean Thank you for your attention.
To include the bounding box in the prompt, you need to quantize the coordinates of the input box in advance to get the location token, and then use it as input (note to follow the "link" format). Personally, I would first input a detailed caption as a prompt to get a location token corresponding to an instance. Then use the obtained location token as input for subsequent VQA.
@pengzhiliang Thank you for your answer. Now I have the quantized bounding box in the prompt and can convert it to a location token. However, I want to generate a detailed caption in the bounding box represented as a location token, but it returns the same string as the input. I think that the token inside the prompt is not recognized correctly. Is there anything else I should be paying attention to?(Or if you could give me the code to reproduce the figure 2(4) that @joshmyersdean mentioned, that would be great...)
@pengzhiliang Can I get a information of "link" format? Or code to change coordinates into location token.
Describe the bug Model I am using: Kosmos-2
There does not appear to be documentation on how to provide bounding boxes to Kosmos-2. e.g., to reproduce Figure 2(4).
Thank you!