njucckevin / SeeClick

The model, data and code for the visual GUI Agent SeeClick
Apache License 2.0
182 stars 9 forks source link

Is SeeClick trained on ScreenSpot? #41

Open 13958806684 opened 2 weeks ago

13958806684 commented 2 weeks ago

Hi, thank you very much for your work!

I wander whether SeeClick is trained on ScreenSpot? I know SeeClick is first pre-trained on "GUI Grounding Pre-training Data" and also fine-tuned in 3 downstream agent task. But what's the use of ScreenSpot dataset? Is SeeClick trained on it after pre-training on "GUI Grounding Pre-training Data", or ScreenSpot is just for evaluation? I also didn't see any finetune script for ScreenSpot?

Thank you for your reply.

njucckevin commented 2 weeks ago

No. ScreenSpot is an evaluation benchmark for testing VLM's grounding performance. It has no corresponding training set. This demonstrates SeeClick’s generalization capability, allowing it to locate screen elements in unseen scenarios.