web-arena-x / webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
https://webarena.dev
Apache License 2.0
632 stars 90 forks source link

What's the current SOTA? #114

Closed Jiayi-Pan closed 3 months ago

Jiayi-Pan commented 3 months ago

Dear authors,

Thank you for the awesome work. We are cooking on something using the WebArena environment and wonder if you know the current state-of-the-art for this benchmark? I checked many of the citations, but none seem to have actually benchmarked on WebArena:(

Also it would be nice if you could maintain a leaderboard somewhere:-)

shuyanzhou commented 3 months ago

Thank you very much for the suggestion. We are working on a live leaderboard and we expect to release it (along with VisualWebArena) by next week. Stay tuned.

There are a few works under my radar that work with WebArena

tldr: the SOTA is still GPT-4 with CoT

We are also evaluating many approaches from our end. I will follow up when we have the results.

shuyanzhou commented 3 months ago

@Jiayi-Pan checkout the preliminary version here