run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.89k stars 5.09k forks source link

[Question]: Agent using multiple indices of the same documents to answer different types of questions #11758

Closed brycecf closed 3 months ago

brycecf commented 7 months ago

Question Validation

Question

I have a ReAct agent that needs to answer summarization questions and also questions with detail over the same documents. Consequently, I want to use both a VectorStoreIndex and DocumentSummaryIndex to support these situations, which the ReAct agent will choose based on the type of question. Is that the recommended way to do this for a RAG application in LlamaIndex or is there an alternative? Provide an example.

dosubot[bot] commented 7 months ago

Hey there, @brycecf! Nice to see you back with another interesting question. Hope you're doing awesome!

Yes, your approach of using both a VectorStoreIndex and DocumentSummaryIndex in a ReAct agent for a RAG application in LlamaIndex to answer different types of questions over the same documents is recommended. This dual-index strategy allows leveraging the strengths of both indexing methods: VectorStoreIndex for efficient similarity search in high-dimensional vector spaces and DocumentSummaryIndex for quick retrieval of document summaries or metadata. This can enhance the overall performance and flexibility of the system in handling various query types.

Here's an example of how to construct and query a VectorStoreIndex:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.vector_stores.deeplake import DeepLakeVectorStore

# Construct vector store and customize storage context
storage_context = StorageContext.from_defaults(
    vector_store=DeepLakeVectorStore(dataset_path="<dataset_path>")
)

# Load documents and build index
documents = SimpleDirectoryReader("../paul_graham_essay/data").load_data()
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

# Query index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

Unfortunately, the context does not provide a direct example of a DocumentSummaryIndex. To implement a DocumentSummaryIndex alongside a VectorStoreIndex, you would typically create a separate index structure tailored for storing and retrieving document summaries or metadata. This could involve using a traditional database or a specialized search engine capable of handling text summaries efficiently. The specific implementation would depend on the requirements of the RAG application and the nature of the queries it needs to support.

For more details on VectorStoreIndex, you can refer to the vector_stores.md document in the LlamaIndex repository.

As for the DocumentSummaryIndex, it is implemented as a Python class named IndexDocumentSummary, which inherits from IndexStruct. This class is designed to manage a mapping between summary nodes and document nodes, facilitating the organization and retrieval of document summaries and their associated nodes. You can find more details about its implementation in the document_summary.py file in the LlamaIndex repository.

In summary, using both index types allows for a more versatile and powerful search and retrieval system within LlamaIndex, capable of handling a wide range of query types more effectively.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.