stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
15.16k stars 1.17k forks source link

Lack of Guidance on Optimizing/Finetuning ReAct Agent with Few-shot Examples #703

Open DanielProkhorov opened 4 months ago

DanielProkhorov commented 4 months ago

The current ReAct documentation lacks clear instructions on optimizing or finetuning a ReAct agent using few-shot examples. Both the main ReAct documentation ReAct Docs and the examples documentation Examples Docs do not provide sufficient guidance in this regard. It's essential to understand that for the ReAct agent to effectively learn from few-shot examples, the complete ReAct cycle (Question, Action, Action Input, Observation) should be encapsulated within these examples.

The provided example in the documentation, such as:

qa_pair = dspy.Example(question="This is a question?", answer="This is an answer.")

does not demonstrate the correct way to optimize or finetune a ReAct agent with few-shot examples.

Could someone please provide a clear example demonstrating the correct approach to optimizing or finetuning a ReAct agent, particularly with few-shot examples? This would greatly benefit users seeking to leverage ReAct effectively.

okhat commented 4 months ago

Agents have not been the priority. But they're no different to other programs:

import dspy

# Define some models.
gpt3 = dspy.OpenAI('gpt-3.5-turbo-0125', max_tokens=1000)
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(lm=gpt3, rm=colbert)

# Declare the agent.
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])

# Try it in zero-shot mode.
agent(question="what is 1+1?")

# See what happened in the final N prompts.
gpt3.inspect_history(n=1)

# Get some data to optimize.
from dspy.datasets import HotPotQA

dataset = HotPotQA(train_seed=1, train_size=200, eval_seed=2023, dev_size=500, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

# Let's optimize
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, max_bootstrapped_demos=2, max_labeled_demos=0, num_candidate_programs=5, num_threads=8)
compiled_agent = tp.compile(agent, trainset=trainset[:50], valset=trainset[50:150])

# Now you can use the compiled_agent
compiled_agent(question="how many storeys are in the castle that David Gregory inherited?")

Hope this helps.

DanielProkhorov commented 4 months ago

Thanks for the quick response @okhat!

Perhaps, I initially need to provide a comprehensive explanation for the ReAct agent that I intend to optimize for operation.

The objective of this agent is to navigate within a mobile phone app (or any screen in general). As such, the agent integrates the following functionalities (tools):

How would the DSPy framework optimize for this specific task? The screen description and proposed action are dynamically constructed. From my understanding the LLM shall view the whole ReAct cycle as few-shot examples rather then providing a question and the answer (like you did with the HotPotQA example). Hence, it won't be sufficient.

Currently, I employ LangChain and Mixtral8x7b for this purpose, with a customized ReAct prompt and a few custom-made trajectories. Hence, I wonder, if I can switch to the DSPy framework for the exactly the reasons you mention within the FAQ section (https://dspy-docs.vercel.app/docs/faqs)