web-arena-x / webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
https://webarena.dev
Apache License 2.0
700 stars 108 forks source link

best way to extract action sequence from execution trace? #99

Closed simonalford42 closed 8 months ago

simonalford42 commented 8 months ago

Thanks for this awesome benchmark!

I'm trying to parse and reexecute action sequences from an execution trace, e.g. take render_0.html the 919_gpt4_8k_cot traces, parse the action sequence proposed by the LLM, and verify whether this action sequence solves the task following the logic used in minimal_example.py.

Is there a way to get a list of Action instances representing the actions taken by the LLM in its execution? To do this, I was thinking of parsing it out of the render_0.html file, but I don't see how to parse an Action object without writing some janky parsing logic of my own — doable, but seems like the wrong way to go about things.

You have guidance on parsing the html from render_{i}.html here: https://github.com/web-arena-x/webarena/blob/main/resources/README.md#render_html

but I don't see how to parse an Action object (https://github.com/web-arena-x/webarena/blob/main/browser_env/actions.py#L94) out of the html without writing a parser on my own.

I saw there is a function parse_playwright_code for parsing playwright format actions, so I thought maybe that's the better way to reproduce the action sequences, but I when I unzip the 919_gpt4_8k_cot folder, there is no trace/ folder to be found.

simonalford42 commented 8 months ago

Nevermind, I figured something out — you can reparse the action from the raw prediction:

from webarena.browser_env import create_id_based_action
from webarena.browser_env.actions import create_id_based_action
from webarena.agent import construct_agent
from webarena.run import config, prepare
from bs4 import BeautifulSoup

args = config()
args.model = 'gpt-4'
args.instruction_path = 'webarena/agent/prompts/jsons/p_cot_id_actree_2s.json'
prepare(args)
agent = construct_agent(args)

with open('919_gpt4_8k_cot/render_0.html', 'r') as f:
        content = f.read()
        soup = BeautifulSoup(content, 'html.parser')
        raw_predictions = soup.find_all("div", {"class": "raw_parsed_prediction"})

actions = [agent.prompt_constructor.extract_action(p.pre.text) for p in raw_predictions]
actions = [create_id_based_action(a) for a in actions]