web-arena-x / webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
https://webarena.dev
Apache License 2.0
700 stars 108 forks source link

Fix the unachievable task evaluation when explicit instruction is not provided #79

Closed shuyanzhou closed 9 months ago

shuyanzhou commented 9 months ago

Our current evaluation assumes the model knows that some tasks are unachievable and will produce N/A in such cases. We should make this more flexible, if the model provides a text-based answer describing why it stops, we should make the answer with the annotated reason and judge accordingly.