normal-computing / fuji-web

Fuji is an AI agent that lives in your browser's sidepanel. You can now get tasks done online with a single command!
Apache License 2.0
191 stars 13 forks source link

Evaluate WebWand on the WebArena dataset #154

Open lynchee-owo opened 2 months ago

lynchee-owo commented 2 months ago

Use WebArena benchmark.

  1. Setup the standalone environment of WebArena
  2. Configurate the urls for each website.
  3. Generate config file for each test example and obtain the auto-login cookies for all websites
  4. Write script to use WebArena's environment based on its run.py
  5. Save task execution results and evaluate.
  6. Analyze the evaluation results