web-arena-x / webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
https://webarena.dev
Apache License 2.0
676 stars 103 forks source link

Gold traces #40

Closed aypan17 closed 10 months ago

aypan17 commented 12 months ago

Hi Shuyan + colleagues, great work! Is there a set of gold traces for each of the tasks that solves it? Could it be released? Thank you!

shuyanzhou commented 11 months ago

Hi Alex, thanks! Currently, we don't have reference traces yet, but we are planning to collect such reference traces from humans. Will keep you posted

zhilizju commented 11 months ago

Hi Alex, thanks! Currently, we don't have reference traces yet, but we are planning to collect such reference traces from humans. Will keep you posted

So, we don't actually know what the human scores are, right? I've found some tasks to be a bit challenging, and I can't solve them either, haha!

shuyanzhou commented 11 months ago

@zhilizju Thanks! Good points!

So, we don't actually know what the human scores are, right?

We are performing human evaluation and hopefully we can release the human performance in the coming month.

I've found some tasks to be a bit challenging

This is expected. Certain tasks would require familiarity with the site. For instance, many tasks in the CMS site require a pretty sophisticated understanding of the site. However, if you encounter any challenging task as a result of underspecification of the natural language intent, feel free to submit issues.