Open likaixin2000 opened 2 months ago
Hi,
For CogAgent, we randomly chose three from their official prompts, as prompts = ["What steps do I need to take to \"{}\"?(with grounding)", "Can you advise me on how to \"{}\"?(with grounding)", "I'm looking for guidance on how to \"{}\".(with grounding)"]
For fuyu, we determined the prompt based on discussions with the authors, as in https://huggingface.co/adept/fuyu-8b/discussions/42. Probably "When presented with a box, perform OCR to extract text contained within it. If provided with text, generate the corresponding bounding box.\n{}"
For Qwen-VL, we follow their official example in GitHub, probably "Generate the bounding box of {}"
.
Hi, I am trying to compare models using ScreenSpot. What were the prompts you used for QwenVL, Fuyu, and CogAgent?