mnotgod96 / AppAgent

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
https://appagent-official.github.io/
MIT License
4.84k stars 511 forks source link

How Can This Become More Faster? #46

Open MirzaAreebBaig opened 6 months ago

MirzaAreebBaig commented 6 months ago

Hello, Have been testing this but there is little lag or i can say slow while it operates even after the learning part is there any plan for improving the speed of this? & how can it support the web? Have been testing on WebApps but its taking some time or need correct attriutes on button to recognise it. Thanks for your valuable time to answer my question.

mnotgod96 commented 6 months ago

What speed are you referring to? If you think the time between each consecutive GPT-4V request is too long, you can reduce the REQUEST_INTERVAL (default to 10) in the config.yaml file. If you feel it is too slow to get the GPT-4V response back, we really couldn't help as that is on OpenAI. It is normal to wait 5-10 seconds for the response to get back. Talking about the web support, you can try using web explorers such as Chrome to see if it helps, but currently AppAgent relies on the XML attributes of Android to get the position of UI elements and the HTML website is a different story. We added a feature that enables the agent to summon a grid overlay to tap a grid area on the screen, but we cannot guarantee that it will work in every case.