lavague-ai / LaVague

Large Action Model framework to develop AI Web Agents
https://docs.lavague.ai/en/latest/
Apache License 2.0
4.94k stars 420 forks source link

Ability to add custom Actions that the agent can perform #394

Open VarunNair31 opened 3 days ago

VarunNair31 commented 3 days ago

Ability to perform custom actions with Drivers

Solution There should be a way to add custom actions that we want to perform. Some websites may require us to do specific actions that may or may not be supported via selenium or playwright or some actions that are available in selenium but not executable by the Agent.

If there is a way already please let me know 🤗

dhuynh95 commented 3 days ago

Sure! Do you have any specific actions in mind, by the way? Selenium or Playwright are quite exhaustive. Are you asking this because the agent failed in a specific scenario? We haven't implemented everything yet to have great coverage (shadow DOM and iframe will be handled soon). By "action" do you mean an atomic action, like "Click on button" or a sequence, like "Get the next meeting on my calendar", which might be translated into "Click on Calendar", "Click on 10 am meeting"?

VarunNair31 commented 3 days ago

The agent failed while i was trying to execute a test case which required me to double click on an element. It gives me the following error: name ActionChains is not defined. Selenium does have this capability, I'm just not sure if i can add it to Lavague on my end using custom methods.

adeprez commented 3 days ago

Currently, the available actions are limited to those implemented in the drivers. We haven't yet considered double-click actions. Could you share your use case with us? I wasn't aware that double-clicks were commonly used in web interfaces.

We can certainly add double-click to the list of available actions. Would you like to contribute to this enhancement? It involves adding the necessary code to the exec_code function from the Selenium Driver, and documenting its usage within the prompt template.

VarunNair31 commented 2 days ago

The use case is basically to open a folder in our internal application. I would be happy to contribute but currently i wont be able to takeout any free time from my schedule. I will be sure to take some time and contribute.

dhuynh95 commented 3 hours ago

Is a double click truly needed versus a simple click? Is it some kind of legacy web app? We can provide the work, but we would need to know better what you want to achieve. If you are open, you can ping me on Discord and we can schedule a quick call to understand where you are at and help you