Add Images for In Context Learning for World Model

The examples in our world model have only text while there is a way to add images too.

Recent paper (https://arxiv.org/abs/2405.09798) shows that it can greatly help MM LLMs to boost performance.

They have a GitHub where they show to do it: https://github.com/stanfordmlgroup/ManyICL/blob/main/ManyICL/LMM.py

We would have to look into Llama Index Multi Modal to see how we can adapt it to provide order so we can show (screenshot, objective) -> (instruction) to the World Model.

lavague-ai / LaVague

Add Images for In Context Learning for World Model #275