The WorldModel sometimes uses OCR directly and stops there. However, it might be better to not end on an OCR output but have the agent write to some memory (which we call agent_outputs so that this output can be consumed by another tool. For instance, we might want to send an email containing the output of a demand to fetch some information on a website.
The WorldModel sometimes calls the PythonEngine to extract information on a given page. However, because it is called after and is asked to provide an output, it sometimes does not fully output what was meant to be provided. This is mentioned in #344. To solve this, the agent should end by providing the name of the variable that contains the answer, instead of asking an LLM to read the content of the variables the agent has access to and writing down in a next token manner. This is both expensive as we consume a lot of LLM output tokens, but can also be imprecise.
To solve both issues we should:
Create a specific instruction, like WRITE_MEMORY for the Agent to write in memory information from a direct OCR call
Have the agent only end with an output by providing the name of the variable that contains the information.
The
Agent
class currently has two issues:WorldModel
sometimes uses OCR directly and stops there. However, it might be better to not end on an OCR output but have the agent write to some memory (which we call agent_outputs so that this output can be consumed by another tool. For instance, we might want to send an email containing the output of a demand to fetch some information on a website.WorldModel
sometimes calls thePythonEngine
to extract information on a given page. However, because it is called after and is asked to provide an output, it sometimes does not fully output what was meant to be provided. This is mentioned in #344. To solve this, the agent should end by providing the name of the variable that contains the answer, instead of asking an LLM to read the content of the variables the agent has access to and writing down in a next token manner. This is both expensive as we consume a lot of LLM output tokens, but can also be imprecise.To solve both issues we should:
Agent
to write in memory information from a direct OCR call