Agent hallucinates the final answer

GabrielFritz commented 1 year ago

Hello!

I was following the recent blog post: https://blog.langchain.dev/custom-agents/

Then I noticed something. Sometimes the Agent jump into the conclusion even though the information required to get this conclusion is not available in intermediate steps observations.

Here is the code I used (pretty similar to the blogpost, but I tried to modify a little the prompt to force the Agent to use just the information returned by the tool):

from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import StringPromptTemplate
from langchain import OpenAI, LLMChain
from langchain.utilities import WikipediaAPIWrapper
from typing import List, Union
from langchain.schema import AgentAction, AgentFinish
import re
from termcolor import colored
import os

# os.environ["OPENAI_API_KEY"] = 
# os.environ["SERPAPI_API_KEY"] = 

search = WikipediaAPIWrapper()

def search_wikipedia(input):

    result = search.run(input)

    if type(result) == str:
        return result[:5000]
    else:
        return "Agent could not find a result."

tools = [
    Tool(
        name="Wikipedia",
        description="Useful for finding information about a specific topic. You cannot use this tool to ask questions, only to find information about a specific topic.",
        func=search_wikipedia,
    )
]

template = """I want you to be FritzAgent. An agent that use tools to get answers. You are reliable and trustworthy. You follow the rules:

Rule 1: Answer the following questions as best as you can with the Observations presented to you.
Rule 2: Never use information outside of the Observations presented to you.
Rule 3: Never jump to conclusions unless the information for the final answer is explicitly presented to you in Observation.

You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
Thought: you should always think about what to do next. Use the Observation to gather extra information, but never use information outside of the Observation.
Action: the action to take, should be one of [{tool_names}]
Action_input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer.
Final Answer: the final answer to the original input question

Begin!

Question: {input}
{agent_scratchpad}
"""

class CustomPromptTemplate(StringPromptTemplate):

    template: str
    tools: List[Tool]

    def format(self, **kwargs) -> str:

        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""

        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "

        kwargs["agent_scratchpad"] = thoughts
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        return self.template.format(**kwargs)

prompt = CustomPromptTemplate(template=template, tools=tools, input_variables=["input", "intermediate_steps"])

class CustomOutputParser(AgentOutputParser):

    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:

        if "Final Answer:" in llm_output:
            return AgentFinish(return_values={"output": llm_output.split("Final Answer:")[1].strip()},
                               log=llm_output
            )

        regex = r"Action: (.*?)[\n]*Action Input:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise ValueError(f"Could not parse output: {llm_output}")
        action = match.group(1).strip()
        action_input = match.group(2).strip()
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)

output_parser = CustomOutputParser()

llm = OpenAI(temperature=0)

llm_chain = LLMChain(llm=llm, prompt=prompt)
tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(llm_chain=llm_chain, output_parser=output_parser, stop=["\nObservation:"], allowed_tools=tool_names)

agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)

while True:
    user_input = input(colored("> ", "green", attrs=["bold"]))
    if user_input == "exit":
        break
    output = agent_executor.run(input=user_input)
    print(colored("FritzGPT:\n", "red"))
    print(output)

Input: What is Leo DiCaprio current relationship status?

Output:

> Entering new AgentExecutor chain...
Thought: I should look for information about Leo DiCaprio's relationship status.
Action: Wikipedia
Action Input: Leo DiCaprio

Observation:Page: Leonardo DiCaprio
Summary: Leonardo Wilhelm DiCaprio (, ; Italian: [diˈkaːprjo]; born November 11, 1974) is an American actor and film producer. Known for his work in biographical and period films, he is the recipient of numerous accolades, including an Academy Award, a British Academy Film Award and three Golden Globe Awards. As of 2019, his films have grossed over $7.2 billion worldwide, and he has been placed eight times in annual rankings of the world's highest-paid actors.
Born in Los Angeles, DiCaprio began his career in the late 1980s by appearing in television commercials. In the early 1990s, he had recurring roles in various television shows, such as the sitcom Parenthood, and had his first major film part as author Tobias Wolff in This Boy's Life (1993). He received critical acclaim and his first Academy Award and Golden Globe Award nominations for his performance as a developmentally disabled boy in What's Eating Gilbert Grape (1993). DiCaprio achieved international stardom with the star-crossed romances Romeo + Juliet (1996) and Titanic (1997). After the latter became the highest-grossing film at the time, he reduced his workload for a few years. In an attempt to shed his image of a romantic hero, DiCaprio sought roles in other genres, including crime drama in Catch Me If You Can (2002) and Gangs of New York (2002); the latter marked the first of his many successful collaborations with director Martin Scorsese.
DiCaprio earned Golden Globe nominations for his performances in the biopic The Aviator (2004), the political thriller Blood Diamond (2006), the crime drama The Departed (2006) and the romantic drama Revolutionary Road (2008). In the 2010s, he made environmental documentaries and starred in several high-profile directors' successful projects, including the action thriller Inception (2010), the western Django Unchained (2012), the biopic The Wolf of Wall Street (2013), the survival drama The Revenant (2015)—for which he won the Academy Award for Best Actor—and the comedy-drama Once Upon a Time in Hollywood (2019).
DiCaprio is the founder of Appian Way Productions—a production company that has made some of his films and the documentary series Greensburg (2008–2010)—and the Leonardo DiCaprio Foundation, a nonprofit organization devoted to promoting environmental awareness. A United Nations Messenger of Peace, he regularly supports charitable causes. In 2005, he was named a Commander of the Ordre des Arts et des Lettres for his contributions to the arts, and in 2016, he appeared in Time magazine's 100 most influential people in the world. DiCaprio was voted one of the 50 greatest actors of all time in a 2022 readers' poll by Empire.

Page: George DiCaprio
Summary: George Paul DiCaprio (born October 2, 1943) is an American writer, editor, publisher, distributor, and former performance artist, known for his work in the realm of underground comix. DiCaprio has collaborated with Timothy Leary and Laurie Anderson. He is the father of actor Leonardo DiCaprio.

Page: List of awards and nominations received by Leonardo DiCaprio
Summary: American actor Leonardo DiCaprio has won 101 awards from 252 nominations. He has been nominated for seven Academy Awards, five British Academy Film Awards and eleven Screen Actors Guild Awards, winning one from each of these and three Golden Globe Awards from thirteen nominations.
DiCaprio received three Young Artist Award nominations for his roles in television shows during the early 1990s—the soap opera Santa Barbara (1990), the dramedy Parenthood (1990) and the sitcom Growing Pains (1991). This was followed by his film debut in the direct-to-video feature Critters 3 (1991). He played a mentally challenged boy in the drama What's Eating Gilbert Grape (1993), a role that earned him nominations for the Academy Award and Golden Globe Award for Best Supporting Actor. Three years later, he appeared in Romeo + Juliet, for which he earned a Best Actor award from the Berlin International Film Festival. DiCaprio featured opposite Kate Winslet in the romantic drama Titanic (1997), the highest-grossing film to that point. For the film, he garnered the MTV Movie Award for Best Male Performance and his first Golden Globe Award for Best Actor nomination. For a role in The Beach, he was nominated for two Teen Choice Awards (Choice Actor and Choice Chemistry) but also a Golden Raspberry Award for Worst Actor. DiCaprio was cast in the role of con-artist Frank Abagnale, Jr. in the crime drama Catch Me If You Can, and starred in the historical drama Gangs of New York—films that earned him two nominations at the 2003 MTV Movie Awards.
DiCaprio was nominated for his first Academy Award, BAFTA Award and Critics' Choice Movie Award for Best Actor for his role as Howard Hughes in the biographical drama The Aviator (2004); he won a Golden Globe Award in the same category. For his next appearances—the crime drama The Departed (2006), the war thriller Blood Diamond (2006), the drama RI now have enough information to answer the question. 
Final Answer: Leonardo DiCaprio is currently single.

> Finished chain.
FritzGPT:

Leonardo DiCaprio is currently single.

Extra thing I noticed: When the Agent was a pirate like in the blogpost, I made the wikipedia search return "I love rum". In this scenario, I was able to enforce the agent to keep calling the tool, instead of jumping into the conclusion. It reached the max retries and failed. BUT with wikipedia search working fine, seems that the fact that the observation has some information about the question (in my case, DiCaprio's information, even if the information has nothing to do with the question), it kinda got more confidence into jumping into conclusions. Does this make any sense?

Does anyone found a way to solve this?

lifan0127 commented 1 year ago

I have had similar experience when the agent abruptly jumped into conclusions. It seems to be quite random.

xinj7 commented 1 year ago

similar problem here

tmin97 commented 1 year ago

I'm facing the similar issues here while i'm using initialize_agent function with AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION.

dosubot[bot] commented 11 months ago

Hi, @GabrielFritz. I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on the information provided, it seems that the issue is related to the agent jumping to conclusions even when the necessary information is not available in the intermediate steps. Several other users have also reported similar experiences. However, the issue remains unresolved at this time.

If this issue is still relevant to the latest version of the LangChain repository, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself. Alternatively, if no action is taken, the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project. Let us know if you have any further questions or concerns.

Dessix commented 11 months ago

Dosu, the way you've described this misconstrues the issue, as it implies that the agent is making irrational judgements, which would simply be a matter of training or prompt guidance.

The problem here is that the agent is requested to produce tokens even after it should have stopped, and continues to extrapolate its predicted answer to its query as part of the "log" it believes itself to be encountering. The agent could emit a stop token in this circumstance, but teaching a network to do that is significantly harder than using something like Microsoft/Guidance to ensure that the inference stops when it completes its evaluation.

More critically, parsing should not fail when this occurs; the library code here should be capable of noticing that the agent has jumped the gun, and ignore whatever tokens occurs past what appears to be an erroneous prediction. As it stands, LangChain simply retries or outright fails, wasting energy, compute time, and potentially valuable responses which simply had excess enthusiasm.

P.S. Whoever thought "stale bots" were a great idea, stop it. Just stop.

dosubot[bot] commented 11 months ago

@baskaryan Could you please help @Dessix with this issue? It seems that the agent is continuing to produce tokens even after it should have stopped, leading to parsing failures. @Dessix also suggests using Microsoft/Guidance to ensure that the inference stops when it completes its evaluation. Thank you!

dosubot[bot] commented 7 months ago

Hi, @GabrielFritz,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue addresses an agent's tendency to jump to conclusions without having the necessary information in intermediate steps, as observed in the provided code. There have been discussions involving other users, including lifan0127, xinj7, and tmin97, reporting similar experiences. The issue was marked as stale previously, but it seems that Dessix has clarified that the problem is not about irrational judgments but rather the agent's failure to stop producing tokens when it should. The suggested solution is to use Microsoft/Guidance to ensure the inference stops when it completes its evaluation. Dosubot has requested baskaryan to assist with the issue.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

langchain-ai / langchain

Agent hallucinates the final answer #2395