Closed Jiayi-Pan closed 3 months ago
Thank you very much for the suggestion. We are working on a live leaderboard and we expect to release it (along with VisualWebArena) by next week. Stay tuned.
There are a few works under my radar that work with WebArena
tldr: the SOTA is still GPT-4 with CoT
We are also evaluating many approaches from our end. I will follow up when we have the results.
@Jiayi-Pan checkout the preliminary version here
Dear authors,
Thank you for the awesome work. We are cooking on something using the WebArena environment and wonder if you know the current state-of-the-art for this benchmark? I checked many of the citations, but none seem to have actually benchmarked on WebArena:(
Also it would be nice if you could maintain a leaderboard somewhere:-)