Open dhuynh95 opened 1 month ago
The examples in our world model have only text while there is a way to add images too.
Recent paper (https://arxiv.org/abs/2405.09798) shows that it can greatly help MM LLMs to boost performance.
They have a GitHub where they show to do it: https://github.com/stanfordmlgroup/ManyICL/blob/main/ManyICL/LMM.py
We would have to look into Llama Index Multi Modal to see how we can adapt it to provide order so we can show (screenshot, objective) -> (instruction) to the World Model.
I‘m trying to work on that
The examples in our world model have only text while there is a way to add images too.
Recent paper (https://arxiv.org/abs/2405.09798) shows that it can greatly help MM LLMs to boost performance.
They have a GitHub where they show to do it: https://github.com/stanfordmlgroup/ManyICL/blob/main/ManyICL/LMM.py
We would have to look into Llama Index Multi Modal to see how we can adapt it to provide order so we can show (screenshot, objective) -> (instruction) to the World Model.