Open madalinabuzau opened 3 months ago
The agent tends to click on file input elements, but since Selenium doesn't support interaction with the file system modal, this action fails. We have a method in our SeleniumDriver that can upload a file using send_keys
.
To ensure the action succeeds, we should guide the LLM to use the set_value
method through its prompt instead of attempting to click
on the element. Additionally, we need to ensure that the World Model correctly passes the file path to be uploaded.
The necessary code is already in place, but the prompts need to be adjusted accordingly. Would you like to contribute on this feature?
Relates to #406
Thanks Alexis. I did modify the prompt but it still clicks to upload the file rather than send_keys. I think I need to dig deeper into the entire codebase to sort this out. Happy to contribute on this! Btw, the costs are insane. I think we need much cheaper multimodal models to make this approach feasible
I have been trying to use the agent to upload a file on a website and unfortunately it doesn't seem to have that action.