lavague-ai / LaVague

Large Action Model framework to develop AI Web Agents
https://docs.lavague.ai/en/latest/
Apache License 2.0
5.45k stars 495 forks source link

Add action to support file upload. #564

Open madalinabuzau opened 2 months ago

madalinabuzau commented 2 months ago

I have been trying to use the agent to upload a file on a website and unfortunately it doesn't seem to have that action.

adeprez commented 2 months ago

The agent tends to click on file input elements, but since Selenium doesn't support interaction with the file system modal, this action fails. We have a method in our SeleniumDriver that can upload a file using send_keys.

To ensure the action succeeds, we should guide the LLM to use the set_value method through its prompt instead of attempting to click on the element. Additionally, we need to ensure that the World Model correctly passes the file path to be uploaded.

The necessary code is already in place, but the prompts need to be adjusted accordingly. Would you like to contribute on this feature?

Relates to #406

madalinabuzau commented 2 months ago

Thanks Alexis. I did modify the prompt but it still clicks to upload the file rather than send_keys. I think I need to dig deeper into the entire codebase to sort this out. Happy to contribute on this! Btw, the costs are insane. I think we need much cheaper multimodal models to make this approach feasible