web-arena-x / webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
https://webarena.dev
Apache License 2.0
632 stars 90 forks source link

Wrong scroll action specification in the prompts vs. in the parsing function #110

Open michalspiegel opened 3 months ago

michalspiegel commented 3 months ago

Hi, I'm reporting just a small bug, the prompts all say to use "scroll [direction=up|down]" but the parsing function expects "scroll [up|down]". This way the parsing always fails for the scroll action because the model always generates "scroll [direction=up|down]".

frankxu2004 commented 2 months ago

@shuyanzhou

michalspiegel commented 2 months ago

On the other hand, judging from you GPT4 experiment logs, this was not a problem for GPT4. Maybe this is just specific for the Gemini I was using. It always tried to generate the action in the wrong format, e.g. scroll [direction=down]. In that case this issue might be irrelevant