landing-ai / vision-agent

Vision agent
Apache License 2.0
931 stars 94 forks source link

Add Long Term Memory and Feedback #80

Closed dillonalaird closed 2 months ago

dillonalaird commented 2 months ago

Adding the remaining two items, long term memory and feedback, for the programming Vision Agent. Tried to make the vision agent more stateless. Calling chat_with_workflow returns a lot of stuff now so that the agent doesn't have to hold on to it as state:

working_memory is the trial and error reflections the model creates when debugging failed code. You can obtain this and use it as long term memory for future usage:

output = agent.chat_with_workflow([{"role": "user", "content": "..."}])
wm = output["working_memory"]

# can save and load it
wm.save("working_mem")
wm = va.utils.load_sim("working_mem")

# merge with existing long term memory
new_ltm = va.utils.merge_sim(wm, ltm)

# can use working memory as long term memory
agent = va.agent.VisionAgentV2(long_term_memory=new_ltm)

If a subtask in the plan fails, it will return the partially completed code and plan early. You can pass a partially completed plan/conversation back to the agent to finish:

output = agent.chat_with_workflow([{"role": "user", "content": "can you code this?"}])

output = agent.chat_with_workflow(
    [{
        "role": "user",
        "content": "can you code this?"
    }, {
        "role": "assistant",
        "content": output["code"],
    }, {
        "role": "user",
        "content": "No, can you use this library?"
    }],
    plan=output["plan"],
)

Or if you want to converse with the agent (passing the old plan back is optional and probably only useful if some part of the original plan failed). This way the chat itself stays stateless, and you can track the conversation/plan.