你好，训练数据中，action的坐标是如何标注的是大模型可以自动识别出来的吗

niuzaisheng / ScreenAgent

ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)

https://arxiv.org/abs/2402.07945

Other

217 stars 24 forks source link

Open dgo2dance opened 4 months ago

dgo2dance commented 4 months ago

你好，训练数据中，action的坐标是如何标注的是大模型可以自动识别出来的吗

niuzaisheng commented 4 months ago

坐标是人工手动修正的，现有的模型大多定位不准。

dgo2dance commented 4 months ago

坐标是人工手动修正的，现有的模型大多定位不准。

训练后在新的场景下基本就可以了是不