lavague-ai / LaVague

Large Action Model framework to develop AI Web Agents
https://docs.lavague.ai/en/latest/
Apache License 2.0
4.96k stars 421 forks source link

Add Images for In Context Learning for World Model #275

Open dhuynh95 opened 1 month ago

dhuynh95 commented 1 month ago

The examples in our world model have only text while there is a way to add images too.

Recent paper (https://arxiv.org/abs/2405.09798) shows that it can greatly help MM LLMs to boost performance.

They have a GitHub where they show to do it: https://github.com/stanfordmlgroup/ManyICL/blob/main/ManyICL/LMM.py

We would have to look into Llama Index Multi Modal to see how we can adapt it to provide order so we can show (screenshot, objective) -> (instruction) to the World Model.

Mecel1147 commented 1 month ago

I‘m trying to work on that