web-arena-x / webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
https://webarena.dev
Apache License 2.0
647 stars 94 forks source link

Update evaluators.py #77

Open Miking98 opened 7 months ago

Miking98 commented 7 months ago

Addresses an eval bug that causes false positives.

Bug: Before, the URL ec2-3-131-244-37.us-east-2.compute.amazonaws.com:7780/admin/reports/report_product/viewedasdf would get a score of 1.0 for the reference URL ec2-3-131-244-37.us-east-2.compute.amazonaws.com:7780/admin/reports/report_product/viewed.

Edit. This commit requires the URLs to match exactly to get credit.