web-arena-x / visualwebarena

VisualWebArena is a benchmark for multimodal agents.
https://jykoh.com/vwa
MIT License
232 stars 44 forks source link

Release log of success/failures for GPT4+SOM trajectories #58

Closed sanjari-orb closed 3 months ago

sanjari-orb commented 3 months ago

Thanks to the authors for releasing the GPT4+SOM trajectories.

However, I do not see any way to find which traces correspond to succeeding tasks v/s failing tasks. Can this information be released as well?

This was done in the WebArena repository while releasing the GPT execution traces: https://github.com/web-arena-x/webarena/tree/main/resources#1132023-execution-traces-from-our-experiments-v2

kohjingyu commented 3 months ago

This should be available in the zip file you linked, as classifieds_gpt4v_som/results.txt, reddit_gpt4v_som/results.txt, and shopping_gpt4v_som/results.txt:

Screenshot 2024-07-22 at 7 55 43 PM
sanjari-orb commented 3 months ago

I missed this, thanks!