run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.1k stars 4.61k forks source link

[Question]: Using custom retriever in a chat engine - Azure AI Search #14215

Open nshern opened 1 week ago

nshern commented 1 week ago

Question Validation

Question

In the example "Azure AI Search" the end result is a Retriever that has Hybrid Search with Semantic Reranking. It is not clear to me how I use this in a chat engine? How would I go abut this? Should the retriever be passed into the chat engine?

dosubot[bot] commented 1 week ago

To use a custom retriever with Hybrid Search and Semantic Reranking in a chat engine, you should pass the retriever into the chat engine. Here's an example of how you can achieve this:

  1. Define the Chat Engine: Create a chat engine class that accepts a retriever and uses it to perform retrieval operations.

    class ChatEngine:
       def __init__(self, retriever):
           self.retriever = retriever
    
       def query(self, query_str):
           query_bundle = QueryBundle(query_str)
           nodes = self.retriever.retrieve(query_bundle)
           # Perform hybrid search and semantic reranking on nodes
           # ...
           return nodes
    
       async def aquery(self, query_str):
           query_bundle = QueryBundle(query_str)
           nodes = await self.retriever.aretrieve(query_bundle)
           # Perform hybrid search and semantic reranking on nodes
           # ...
           return nodes
  2. Initialize the Custom Retriever: Replace YourCustomRetriever with your specific retriever implementation.

    retriever = YourCustomRetriever()  # Replace with your specific retriever implementation
    chat_engine = ChatEngine(retriever)
  3. Perform Queries: Use the query and aquery methods to perform synchronous and asynchronous queries, respectively.

    # Synchronous query
    response = chat_engine.query("Your query here")
    
    # Asynchronous query
    response = await chat_engine.aquery("Your query here")

For a more advanced setup, you can use the ContextChatEngine class from the LlamaIndex framework, which integrates a retriever to retrieve context, sets the context in the system prompt, and then uses an LLM to generate a response.

Here is an example of how to set up the ContextChatEngine:

from llama_index.core.chat_engine.context import ContextChatEngine
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.llms.llm import LLM
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.callbacks import CallbackManager

# Initialize your retriever and LLM
retriever = YourCustomRetriever()
llm = YourCustomLLM()

# Initialize the chat engine with the retriever and LLM
chat_engine = ContextChatEngine(
    retriever=retriever,
    llm=llm,
    memory=ChatMemoryBuffer.from_defaults(),
    prefix_messages=[],
    node_postprocessors=[],
    context_template=None,
    callback_manager=CallbackManager([])
)

# Synchronous chat
response = chat_engine.chat("Your query here")

# Asynchronous chat
response = await chat_engine.achat("Your query here")

This setup allows you to leverage multiple retrievers and apply advanced fusion techniques to improve the quality of the search results in your chat engine [1][2][3][4].

To continue talking to Dosu, mention @dosu.