mnotgod96 / AppAgent

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
https://appagent-official.github.io/
MIT License
4.97k stars 538 forks source link

使用通义千问模型,huaman demonstration模式,流程无法进行下去 #54

Open tao-taoran opened 7 months ago

tao-taoran commented 7 months ago

C256FE78-40E8-4454-B0C4-390F7329FDC9 如图所示,理论上这时候应该出现指示:choose one of the following actions you want to perform at the current screen: tap , text , long press, swipe, stop 但是没有出现。 手机屏幕的截图是出来了,如图所示 8CF2CFE8-3AAF-4B45-88F8-479016AFA9FA

mnotgod96 commented 7 months ago

跳出手机截屏窗口后点击键盘上任意按键就可以继续操作了哈

tao-taoran commented 7 months ago

跳出手机截屏窗口后点击键盘上任意按键就可以继续操作了哈 理论上这时候应该出现指示:choose one of the following actions you want to perform at the current screen: tap , text , long press, swipe, stop 但是没有出现。 在这种情况下我键盘上输入tap指令并没有系统并没有任何反应

lxh1121 commented 7 months ago

点击任意按键后,流程还是没有继续下去 还有一个问题:完成一次操作后,流程也是暂停,无法退出也无法进行下一次 WeChatd49a85fb17ea649f64e5a4b626822d8d

zhelloworld123456 commented 7 months ago

跳出手机截屏窗口后点击键盘上任意按键就可以继续操作了哈 理论上这时候应该出现指示:choose one of the following actions you want to perform at the current screen: tap , text , long press, swipe, stop 但是没有出现。 在这种情况下我键盘上输入tap指令并没有系统并没有任何反应

可以留个联系方式吗?共同交流。我邮箱:13426021350@163.com vx:13426021350 。期待与你一起交流

LeeOrange-is-me commented 5 months ago

I get the solution. There is a new window called 'Image' which represents the current screen in your phone, and you have to press some key in your keyboard in this new window , and then the new window will vanish , then you can continue your steps