LLM Agents with Reflection, Multi agent collab, Planning & Tool Use

manisnesan commented 7 months ago

Jay Alammar Building LLM Agent with tool use - YouTube

Tool use is a method whichs allows developers to connect Cohere's Command models to external tools like search engines, APIs, databases, and other software tools.
Just like how Retrieval-Augmented Generation (RAG) allows a model to use an external data source to improve factual generation, tool use is a capability that allows retrieving data from multiple sources.
But it goes beyond simply retrieving information and is able to use software tools to execute code, or even create entries in a CRM system.
In this video, we'll see how we can use two tools to create a simple data analyst agent that is able to search the web and run code in a python interpreter.
This agent uses Cohere's Command R+ mode and Langchain.

manisnesan commented 7 months ago

AI agents looping vs planning

Predict necessary tools given a request using a recommendation system.
Generate an execution plan (DAG) based on the request, retrieved tools, and their descriptions. Iteratively refine the plan through conversation.
Fine-tune models to predict the final plan from inputs and tools, using examples of successfully running plans and implementing individual edges as needed.

The goal is to create a probabilistic plan construction process separate from its deterministic execution, producing artifacts for retrieval and few-shot examples, ultimately enabling single-shot output prediction.

Source: Jason Liu post

manisnesan commented 6 months ago

From Andrew ng post on Reflection as design pattern for Agentic workflows

Original

Last week, I described four design patterns for AI agentic workflows that I believe will drive significant progress this year: Reflection, Tool use, Planning and Multi-agent collaboration. Instead of having an LLM generate its final output directly, an agentic workflow prompts the LLM multiple times, giving it opportunities to build step by step to higher-quality output. Here, I'd like to discuss Reflection. For a design pattern that’s relatively quick to implement, I've seen it lead to surprising performance gains.

You may have had the experience of prompting ChatGPT/Claude/Gemini, receiving unsatisfactory output, delivering critical feedback to help the LLM improve its response, and then getting a better response. What if you automate the step of delivering critical feedback, so the model automatically criticizes its own output and improves its response? This is the crux of Reflection.

Take the task of asking an LLM to write code. We can prompt it to generate the desired code directly to carry out some task X. After that, we can prompt it to reflect on its own output, perhaps as follows:

Here’s code intended for task X: [previously generated code] Check the code carefully for correctness, style, and efficiency, and give constructive criticism for how to improve it.

Sometimes this causes the LLM to spot problems and come up with constructive suggestions. Next, we can prompt the LLM with context including (i) the previously generated code and (ii) the constructive feedback, and ask it to use the feedback to rewrite the code. This can lead to a better response. Repeating the criticism/rewrite process might yield further improvements. This self-reflection process allows the LLM to spot gaps and improve its output on a variety of tasks including producing code, writing text, and answering questions.

And we can go beyond self-reflection by giving the LLM tools that help evaluate its output; for example, running its code through a few unit tests to check whether it generates correct results on test cases or searching the web to double-check text output. Then it can reflect on any errors it found and come up with ideas for improvement.

Further, we can implement Reflection using a multi-agent framework. I've found it convenient to create two different agents, one prompted to generate good outputs and the other prompted to give constructive criticism of the first agent's output. The resulting discussion between the two agents leads to improved responses.

Reflection is a relatively basic type of agentic workflow, but I've been delighted by how much it improved my applications’ results in a few cases. I hope you will try it in your own work. If you’re interested in learning more about reflection, I recommend these papers:

Self-Refine: Iterative Refinement with Self-Feedback, by Madaan et al. (2023)
Reflexion: Language Agents with Verbal Reinforcement Learning, by Shinn et al. (2023)
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, by Gou et al. (2024)

I’ll discuss the other agentic design patterns as well in the future.

[Original text: deeplearning.ai/the-batch/issu… ]

manisnesan commented 6 months ago

Self-Refine: Iterative Refinement with Self-Feedback,” Madaan et al. (2023) “Reflexion: Language Agents with Verbal Reinforcement Learning,” Shinn et al. (2023) “CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing,” Gou et al. (2024)

manisnesan commented 6 months ago

Agent Design Patterns : Tool Use

Gorilla: Large Language Model Connected with Massive APIs,” Patil et al. (2023) “MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action,” Yang et al. (2023) “Efficient Tool Use with Chain-of-Abstraction Reasoning,” Gao et al. (2024)

“If I invest $100 at compound 7% interest for 12 years, what do I have at the end?”, rather than trying to generate the answer directly using a transformer network — which is unlikely to result in the right answer — the LLM might use a code execution tool to run a Python command to compute 100 * (1+0.07)*12 to get the right answer. The LLM might generate a string like this: {tool: python-interpreter, code: "100 (1+0.07)**12"}.

manisnesan commented 6 months ago

Agent Design Pattern - Planning

For example, if we ask an agent to do online research on a given topic, we might use an LLM to break down the objective into smaller subtasks, such as researching specific subtopics, synthesizing findings, and compiling a report
Many tasks can’t be done in a single step or with a single tool invocation, but an agent can decide what steps to take

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Wei et al. (2022)
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face,” Shen et al. (2023)
“Understanding the planning of LLM agents: A survey,” by Huang et al. (2024)

manisnesan / til

LLM Agents with Reflection, Multi agent collab, Planning & Tool Use #94

Agent Design Pattern - Planning