run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.02k stars 5.13k forks source link

[Question]: Incorporate history in query engine #16303

Closed Tejaswgupta closed 1 week ago

Tejaswgupta commented 1 week ago

Question Validation

Question

I need to use a custom query engine with history. Only two options are incorporating history manually with query engine or using the custom query engine with chat engine. I couldn't find resources for either of those.

dosubot[bot] commented 1 week ago

I found some relevant discussions and issues that might help you with incorporating history in a custom query engine or using a custom query engine with a chat engine:

  1. Discussions:

  2. Issues:

To incorporate history in a custom query engine or use a custom query engine with a chat engine, you can use the CondenseQuestionChatEngine class from the llama_index library. This class allows you to generate a standalone question from the conversation context and the last message, and then query the query engine for a response. Here is an example of how you can set it up:

from llama_index.core.chat_engine.condense_question import CondenseQuestionChatEngine
from llama_index.core.base.base_query_engine import BaseQueryEngine
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.prompts.base import PromptTemplate
from llama_index.service_context import ServiceContext
from llama_index.core.llms.llm import LLM

# Define your custom query engine
class CustomQueryEngine(BaseQueryEngine):
    def query(self, question: str):
        # Implement your custom query logic here
        pass

# Initialize the components
query_engine = CustomQueryEngine()
condense_question_prompt = PromptTemplate(
    """\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.

<Chat History>
{chat_history}

<Follow Up Message>
{question}

<Standalone question>
"""
)
memory = ChatMemoryBuffer()
llm = LLM()  # Initialize your LLM here
service_context = ServiceContext.from_defaults(llm=llm)

# Create the CondenseQuestionChatEngine
chat_engine = CondenseQuestionChatEngine(
    query_engine=query_engine,
    condense_question_prompt=condense_question_prompt,
    memory=memory,
    llm=llm,
    verbose=True
)

# Example usage
response = chat_engine.chat("Your follow-up message here")
print(response)

This setup ensures that the chat engine uses the conversation history to generate a standalone question and then queries the custom query engine for a response. The ChatMemoryBuffer is used to store and manage the chat history [6][7].

Alternatively, you can use a high-level API to configure the chat engine more easily:

chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)
response = chat_engine.chat("Your follow-up message here")
print(response)

This high-level API simplifies the setup process while still allowing you to specify the condense_question mode for handling conversation history [7].

Additionally, you can initialize the CondenseQuestionChatEngine with an initial chat history, as shown in the following test example:

from unittest.mock import Mock
from llama_index.core.base.llms.types import ChatMessage, MessageRole
from llama_index.core.base.response.schema import Response

def test_condense_question_chat_engine_with_init_history() -> None:
    query_engine = Mock(spec=BaseQueryEngine)
    query_engine.query.side_effect = lambda x: Response(response=x)
    engine = CondenseQuestionChatEngine.from_defaults(
        query_engine=query_engine,
        chat_history=[
            ChatMessage(role=MessageRole.USER, content="test human message"),
            ChatMessage(role=MessageRole.ASSISTANT, content="test ai message"),
        ],
    )

    response = engine.chat("new human message")
    assert str(response) == (
        "{'question': 'new human message', 'chat_history': 'user: test human "
        "message\\nassistant: test ai message'}"
    )

This example demonstrates how to initialize the chat engine with a predefined chat history, which can be useful for maintaining context across sessions [8][9].

To continue talking to Dosu, mention @dosu.

Tejaswgupta commented 1 week ago

We're building a RAG chatbot so a retriever is needed but it's missing in your response.

dosubot[bot] commented 1 week ago

To incorporate a retriever into a RAG chatbot using the CondenseQuestionChatEngine, follow these steps:

  1. Create a retriever from an index:

    retriever = index.as_retriever()
  2. Configure the retriever if needed:

    retriever = summary_index.as_retriever(
        retriever_mode="llm",
        choice_batch_size=5,
    )
  3. Use the retriever in the CondenseQuestionChatEngine:

    from llama_index.legacy.chat_engine import CondenseQuestionChatEngine
    
    chat_engine = CondenseQuestionChatEngine(retriever=retriever)

Alternatively, you can directly create the CondenseQuestionChatEngine from the index:

```python
chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)
```

This setup ensures that the CondenseQuestionChatEngine is properly configured with a retriever to handle user queries effectively [1].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 week ago

@Tejaswgupta use a chat engine -- I prefer the CondensePlusContextChatEngine

from llama_index.core.chat_engine import CondensePlusContextChatEngine

chat_engine = CondensePlusContextChatEngine.from_defaults(retriever, llm=llm, ....)
Tejaswgupta commented 1 week ago

@logan-markewich I had to use HyDEQueryTransform but couldn't find any way to incorporate that with history. Additionally, also a system prompt because the queries are pretty complex.

logan-markewich commented 1 week ago

Hmm, tbh I would just build my own thing from lower-level abstractions if you need maximum flexibility

This is what workflows are made for https://docs.llamaindex.ai/en/stable/module_guides/workflow/#workflows

Something like a function-calling agent is one example of this, but probably more complicated than what your needs currently are. https://colab.research.google.com/drive/1wVCkvX7oQu1ZwrMSAyaJ8QyzHyfR0D_j?usp=sharing

But, more than enough in both those links to build out something cool