njucckevin / SeeClick

The model, data and code for the visual GUI Agent SeeClick
Apache License 2.0
139 stars 8 forks source link

Wrong annotations in finetune dataset(widget caption) #14

Closed 921112343 closed 4 months ago

921112343 commented 4 months ago

The bbox annotation in your finetune dataset (widget caption) may be wrong. Every element in the same screen get the same bbox annotation, which is impossible. For example: { "img_filename": "57800.jpg", "instruction": "delete", "bbox": [ 0.8395833333333333, 0.79296875, 0.9854166666666667, 0.828515625 ] }, { "img_filename": "57800.jpg", "instruction": "dust pin and water drink reminder", "bbox": [ 0.8395833333333333, 0.79296875, 0.9854166666666667, 0.828515625 ] }, { "img_filename": "57800.jpg", "instruction": "move to trash", "bbox": [ 0.8395833333333333, 0.79296875, 0.9854166666666667, 0.828515625 ] },

njucckevin commented 4 months ago

This is because each element in the widget captioning dataset may correspond to multiple captions. For example, a delete icon can be discribe as "delete", "dust pin and water drink reminder" and "move to trash" in app screen. It's a common setting for image caption datasets that have multiple captions for one image.

921112343 commented 4 months ago

Got it, Thanks for your response!