Open 13958806684 opened 2 weeks ago
No. ScreenSpot is an evaluation benchmark for testing VLM's grounding performance. It has no corresponding training set. This demonstrates SeeClick’s generalization capability, allowing it to locate screen elements in unseen scenarios.
Hi, thank you very much for your work!
I wander whether SeeClick is trained on ScreenSpot? I know SeeClick is first pre-trained on "GUI Grounding Pre-training Data" and also fine-tuned in 3 downstream agent task. But what's the use of ScreenSpot dataset? Is SeeClick trained on it after pre-training on "GUI Grounding Pre-training Data", or ScreenSpot is just for evaluation? I also didn't see any finetune script for ScreenSpot?
Thank you for your reply.