web-arena-x / visualwebarena

VisualWebArena is a benchmark for multimodal agents.
https://jykoh.com/vwa
MIT License
232 stars 44 forks source link

Reproducing open-source model results #27

Open anithselva opened 6 months ago

anithselva commented 6 months ago

Hello,

I'm looking to reproduce some of the open-source model results from the VWA paper: (1) Mixtral-8x7B model as the LLM backbone for Caption-augmented model (2) CogVLM for the Multimodal Model.

Could someone share with me any flags/commands or instructions to setup these configurations for eval?

mlin12321 commented 6 months ago

I would also like to get source code to replicate the IDEFICS-80 experiments, as well.

yxchng commented 1 month ago

any updates?