niuzaisheng / ScreenAgent

ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
https://arxiv.org/abs/2402.07945
Other
217 stars 24 forks source link

Action Type:up 和 down的含义 #7

Closed Lqf-HFNJU closed 4 months ago

Lqf-HFNJU commented 4 months ago

您好!请问一下在ScreenAgent数据集里,除了文章中定义的action type之外,还存在“up”和“down”两种type,这两种type是什么意思呢?什么场景下会用到呢? 谢谢!

niuzaisheng commented 4 months ago

根据VNC协议的定义,ScreenAgent中鼠标的Click操作实际上是由Move->Down->Up(移动、按下、抬起)三个更细粒度的VNC操作组成的; ScreenAgent中鼠标的拖拽是由Move和Drag两个操作组成,而Drag操作由Down->Move->Up三个更细粒度的操作组成; 同理,键盘的按下操作也是由Down和Up先后执行完成的。 具体的实现方式可以参考client/action.py

Lqf-HFNJU commented 4 months ago

谢谢!