很好的工作！想请教些问题

showlab / ShowUI

Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent

https://arxiv.org/abs/2411.17465

MIT License

383 stars 16 forks source link

很好的工作！想请教些问题 #9

Closed positive666 closed 13 hours ago

positive666 commented 13 hours ago

1.直接用QWEN2VL去自己设计标注SFT的嘛？ 2.多步的GUI agent执行训练上有什么思路嘛？

QinghongLin commented 13 hours ago

@positive666 Hi 第一个问题不太清楚, 您是指用Qwen2VL来给数据打标吗？第二个问题我们在paper的sec Interleaved VLA streaming里介绍我们的思考哈; 另外您可以加入微信群，直接在群里提问 :)

positive666 commented 13 hours ago

@positive666 Hi 第一个问题不太清楚, 您是指用Qwen2VL来给数据打标吗？第二个问题我们在paper的sec Interleaved VLA streaming里介绍我们的思考哈; 另外您可以加入微信群，直接在群里提问 :) 好的感谢回复我正在读paper，第一个问题是我看Base是QWEN2VL，这个模型可以直接用来训练box\points类grounding的数据嘛？

QinghongLin commented 13 hours ago

@positive666 是的，QWEN2VL可以直接训练box\points类grounding的数据

positive666 commented 13 hours ago

我加了咱们的群有问题是直接在群里抛方便嘛还是私聊，我对这个非常感兴趣，我想作个实验实现一个小场景的fintune 想请教不少问题，另外这个项目目前提供的模型DEMO 使用上是不是不完整，支持多轮嘛

QinghongLin commented 13 hours ago

@positive666 可以在群里直接问的哈，现在支持多轮的，你只需要把上一轮的action output append到下一轮的action history中。咱们目前也在不断完善demo和部署，会不断提升的。