Fix the unachievable task evaluation when explicit instruction is not provided

web-arena-x / webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

https://webarena.dev

Apache License 2.0

700 stars 108 forks source link

Fix the unachievable task evaluation when explicit instruction is not provided #79

Closed shuyanzhou closed 9 months ago

shuyanzhou commented 9 months ago

Our current evaluation assumes the model knows that some tasks are unachievable and will produce N/A in such cases. We should make this more flexible, if the model provides a text-based answer describing why it stops, we should make the answer with the annotated reason and judge accordingly.