Closed Tejaswgupta closed 1 week ago
I found some relevant discussions and issues that might help you with incorporating history in a custom query engine or using a custom query engine with a chat engine:
Discussions:
Issues:
To incorporate history in a custom query engine or use a custom query engine with a chat engine, you can use the CondenseQuestionChatEngine
class from the llama_index
library. This class allows you to generate a standalone question from the conversation context and the last message, and then query the query engine for a response. Here is an example of how you can set it up:
from llama_index.core.chat_engine.condense_question import CondenseQuestionChatEngine
from llama_index.core.base.base_query_engine import BaseQueryEngine
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.prompts.base import PromptTemplate
from llama_index.service_context import ServiceContext
from llama_index.core.llms.llm import LLM
# Define your custom query engine
class CustomQueryEngine(BaseQueryEngine):
def query(self, question: str):
# Implement your custom query logic here
pass
# Initialize the components
query_engine = CustomQueryEngine()
condense_question_prompt = PromptTemplate(
"""\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.
<Chat History>
{chat_history}
<Follow Up Message>
{question}
<Standalone question>
"""
)
memory = ChatMemoryBuffer()
llm = LLM() # Initialize your LLM here
service_context = ServiceContext.from_defaults(llm=llm)
# Create the CondenseQuestionChatEngine
chat_engine = CondenseQuestionChatEngine(
query_engine=query_engine,
condense_question_prompt=condense_question_prompt,
memory=memory,
llm=llm,
verbose=True
)
# Example usage
response = chat_engine.chat("Your follow-up message here")
print(response)
This setup ensures that the chat engine uses the conversation history to generate a standalone question and then queries the custom query engine for a response. The ChatMemoryBuffer
is used to store and manage the chat history [6][7].
Alternatively, you can use a high-level API to configure the chat engine more easily:
chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)
response = chat_engine.chat("Your follow-up message here")
print(response)
This high-level API simplifies the setup process while still allowing you to specify the condense_question
mode for handling conversation history [7].
Additionally, you can initialize the CondenseQuestionChatEngine
with an initial chat history, as shown in the following test example:
from unittest.mock import Mock
from llama_index.core.base.llms.types import ChatMessage, MessageRole
from llama_index.core.base.response.schema import Response
def test_condense_question_chat_engine_with_init_history() -> None:
query_engine = Mock(spec=BaseQueryEngine)
query_engine.query.side_effect = lambda x: Response(response=x)
engine = CondenseQuestionChatEngine.from_defaults(
query_engine=query_engine,
chat_history=[
ChatMessage(role=MessageRole.USER, content="test human message"),
ChatMessage(role=MessageRole.ASSISTANT, content="test ai message"),
],
)
response = engine.chat("new human message")
assert str(response) == (
"{'question': 'new human message', 'chat_history': 'user: test human "
"message\\nassistant: test ai message'}"
)
This example demonstrates how to initialize the chat engine with a predefined chat history, which can be useful for maintaining context across sessions [8][9].
To continue talking to Dosu, mention @dosu.
We're building a RAG chatbot so a retriever is needed but it's missing in your response.
To incorporate a retriever into a RAG chatbot using the CondenseQuestionChatEngine
, follow these steps:
Create a retriever from an index:
retriever = index.as_retriever()
Configure the retriever if needed:
retriever = summary_index.as_retriever(
retriever_mode="llm",
choice_batch_size=5,
)
Use the retriever in the CondenseQuestionChatEngine
:
from llama_index.legacy.chat_engine import CondenseQuestionChatEngine
chat_engine = CondenseQuestionChatEngine(retriever=retriever)
Alternatively, you can directly create the CondenseQuestionChatEngine
from the index:
```python
chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)
```
This setup ensures that the CondenseQuestionChatEngine
is properly configured with a retriever to handle user queries effectively [1].
To continue talking to Dosu, mention @dosu.
@Tejaswgupta use a chat engine -- I prefer the CondensePlusContextChatEngine
from llama_index.core.chat_engine import CondensePlusContextChatEngine
chat_engine = CondensePlusContextChatEngine.from_defaults(retriever, llm=llm, ....)
@logan-markewich I had to use HyDEQueryTransform
but couldn't find any way to incorporate that with history. Additionally, also a system prompt because the queries are pretty complex.
Hmm, tbh I would just build my own thing from lower-level abstractions if you need maximum flexibility
This is what workflows are made for https://docs.llamaindex.ai/en/stable/module_guides/workflow/#workflows
Something like a function-calling agent is one example of this, but probably more complicated than what your needs currently are. https://colab.research.google.com/drive/1wVCkvX7oQu1ZwrMSAyaJ8QyzHyfR0D_j?usp=sharing
But, more than enough in both those links to build out something cool
Question Validation
Question
I need to use a custom query engine with history. Only two options are incorporating history manually with query engine or using the custom query engine with chat engine. I couldn't find resources for either of those.