Open numbmelon opened 3 months ago
Thank you for your attention and suggestions,
We have done some checking work on our guiact
data before.
web-multi
is annotated by crowdsourcing, and we filter unreasonable actions by design rules after the annotation.web-single
is annotated by GPT4-V, and there are many errors in the original data. Therefore, we checked the web-single
data by crowdsourcing and improved the accuracy from 55% to 92%.smartphone
data is processed from AITW-general
, and we don't expect that there would be such long actions in AITW data.Thank you again for your questions, we plan to re-fix our data in the next two months:
When we finish the check, we will update our data and notify you in the issue and the "updates" partition in this repository.
Thank you for your great work, this has been particularly helpful for me! However, i have encountered some issue while using the
guiact
dataset. These may effect both training and evaluation process.For example, in the
smartphone_train_data.json
, i have noticed that the task starting withuid_episode_332227296314252139_step_*
, contain an unusually high number of actions (over 100). These actions include an excessive amount ofswipe
andtap
operations, which seems abnormal.Also, when evaluate the
web-single
test set, i checked the error log and found texts as following:The bounding boxes marked as answers look like this: As you can see, this is not so reasonable. Perhaps a single select operation at element 46 could complete the task?
Could these issues be improved in future releases, perhaps through manual review or by using scripts to filter out problematic data? Your attention to this matter would be greatly appreciated. Thank you again for your hard work and for providing such a valuable dataset.