run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.64k stars 4.73k forks source link

[Question]: ReACT agent talking to itself and hallucinating #14520

Open fawazPhc opened 2 weeks ago

fawazPhc commented 2 weeks ago

Question Validation

Question

I'm new to creating a custom ReACT agent to query my data using the RAG technique. It turns out that my locally running LLama2 7B model(Using Ollama) does fine with questions about my data(I added the retriever as a Queryengine tool) but for casual conversations, it ends up calling the tool nevertheless and hallucinates. To overcome I created a separate index for casual conversations and added the retriever as another query engine tool. As expected when I start the casual conversation it does end up calling the specified tool but later hallucinates by talking to itself(at least that's what I think). Is it because I'm using a lightweight local model or do I have to change how I approach it? Here is my code. Since the ReACT used to call tools if I started a casual convo, I had to add another index of casual conversations in the pinecone db. The data I used,is also attached.

I want the model to use its already pre-built-in knowledge like casual convo and reply based on that instead of relying on a tool, even if it does rely on the tool it doesn't use the information I provided. Its like the agent is using the information it retrieves as part of its internal thinking process and mistakenly treating it as a new input rather than forming a final response.You can view the attached image for more clarity.Agent is able to respod to a "hi" using the tool but it hallucinates when I ask "how are you doing?"

Screenshot 2024-06-29 164412 Screenshot 2024-07-02 120600

index_name = "custom-data"
pinecone_index = pc.Index(index_name)
pinecone_index_2 = pc.Index("casualconversation")

vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
vector_store_2 = PineconeVectorStore(pinecone_index=pinecone_index_2)

vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
vector_index_2 = VectorStoreIndex.from_vector_store(vector_store=vector_store_2)

llm = Ollama(model="llama3", request_timeout=1000000)
retriever = VectorIndexRetriever(index=vector_index, similarity_top_k=5)
retriever_2 = VectorIndexRetriever(index=vector_index_2, similarity_top_k=5)

tools = [
    QueryEngineTool(
        query_engine=RetrieverQueryEngine(retriever=retriever),
        metadata=ToolMetadata(
            name="phc_data",
            description="this has data for the company XYZ ",
        ),
    ),
    QueryEngineTool(
        query_engine=RetrieverQueryEngine(retriever=retriever_2),
        metadata=ToolMetadata(
            name="casualconversation",
            description="Use this whenever the user asks a casual question or for casual conversations",
        ),
    ),
]
agent = ReActAgent.from_tools(
    tools=tools,
    llm=llm,
    verbose=True,
    # context=context
)

response = agent.chat("Hi")
print(str(response))
response=agent.chat("How are you doing?")
print(str(response))
dosubot[bot] commented 2 weeks ago

Hey there, @fawazPhc! How can I assist you today?

The issue you're encountering with the ReActAgent hallucinating during casual conversations and confusing retrieved information with its internal knowledge can be addressed by refining the agent's configuration and handling of tool responses. Here are some steps to mitigate this issue:

  1. Adjust the Ollama Model Parameters: Fine-tune the parameters of the Ollama model to improve its handling of casual conversations. Specifically, you can adjust the temperature and context_window parameters to make the model's responses more deterministic and maintain context over longer conversations.

    llm = Ollama(model="llama3", request_timeout=1000000, temperature=0.5, context_window=1024)
  2. Use Node Postprocessors: Implement node postprocessors to filter and validate the retrieved nodes before they are synthesized into a response. This can help reduce hallucinations by ensuring only relevant information is used.

    from llama_index.core.node_postprocessors import BaseNodePostprocessor
    
    class CustomNodePostprocessor(BaseNodePostprocessor):
        def postprocess_nodes(self, nodes, query_bundle):
            # Implement custom filtering logic here
            return [node for node in nodes if self.is_relevant(node)]
    
        def is_relevant(self, node):
            # Define relevance criteria
            return True
    
    retriever = VectorIndexRetriever(
        index=vector_index,
        similarity_top_k=5,
        node_postprocessors=[CustomNodePostprocessor()]
    )
    retriever_2 = VectorIndexRetriever(
        index=vector_index_2,
        similarity_top_k=5,
        node_postprocessors=[CustomNodePostprocessor()]
    )
  3. Separate Contexts for Different Tools: Ensure that the agent maintains separate contexts for different tools to avoid confusion between casual conversation and data-specific queries. This can be achieved by using a context-augmented agent.

    from llama_index.core import Document
    from llama_index.agent.openai_legacy import ContextRetrieverOpenAIAgent
    
    context_texts = [
        "Casual conversation context: Greetings, small talk, etc.",
        "Data-specific context: Information about company XYZ, etc."
    ]
    context_docs = [Document(text=t) for t in context_texts]
    context_index = VectorStoreIndex.from_documents(context_docs)
    
    context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
        tools,
        context_index.as_retriever(similarity_top_k=1),
        verbose=True,
    )
  4. Verbose Mode for Debugging: Enable verbose mode to get detailed logs and understand how the agent is processing the queries and selecting tools. This can help identify and fix issues related to tool selection and response generation.

    agent = ReActAgent.from_tools(
        tools=tools,
        llm=llm,
        verbose=True,
    )
  5. Handle Tool Responses Carefully: Ensure that the agent correctly handles the responses from the tools and integrates them into its reasoning process without confusing them with its internal knowledge.

    def _extract_reasoning_step(self, output, is_streaming=False):
        if output.message.content is None:
            raise ValueError("Got empty message.")
        message_content = output.message.content
        current_reasoning = []
        try:
            reasoning_step = self._output_parser.parse(message_content, is_streaming)
        except BaseException as exc:
            raise ValueError(f"Could not parse output: {message_content}") from exc
        if self._verbose:
            print_text(f"{reasoning_step.get_content()}\n", color="pink")
        current_reasoning.append(reasoning_step)
    
        if reasoning_step.is_done:
            return message_content, current_reasoning, True
    
        reasoning_step = cast(ActionReasoningStep, reasoning_step)
        if not isinstance(reasoning_step, ActionReasoningStep):
            raise ValueError(f"Expected ActionReasoningStep, got {reasoning_step}")
    
        return message_content, current_reasoning, False

By following these steps, you can improve the handling of casual conversations and reduce hallucinations in the ReActAgent [1][2][3].

To continue talking to Dosu, mention @dosu.

VictorAny commented 1 week ago

+1 on this. I had to talk to the raw llm and add my previous messages to the raw query string, instead of using ReactAgent since it kept doing the weirdest/most unnecessary looping to a simple "Hi".

To add to this, the llm is handling the message just fine so it looks like it's the ReActAgent thats causing issues.