[Question]: How to do Multi-Document RAG with Document Summary Index?

GxTeo commented 8 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

Using open source LLMs zephyr-7b-alpha, how can I do multiple document RAG especially where in my use case, there are complex queries, which may require me to decompose the query and query multiple documents and then synthesise the final response.

Tried SubQuestionQueryEngine, however I am limited by the number of tokens that can be generated by the LLM.

dosubot[bot] commented 8 months ago

🤖

Hey @GxTeo! Great to see you diving into another challenging project. How's everything going on your end?

To implement Multi-Document RAG with Document Summary Index using the zephyr-7b-alpha LLM, you can use the SubQuestionQueryEngine class. This class generates a "query plan" containing sub-queries against sub-documents before synthesizing the final answer.

First, define an index for each document/data source, and wrap it with a QueryEngineTool:

from llama_index.tools import QueryEngineTool, ToolMetadata

query_engine_tools = [
    QueryEngineTool(
        query_engine=sept_engine,
        metadata=ToolMetadata(
            name="sept_22",
            description="Provides information about Uber quarterly financials ending September 2022",
        ),
    ),
    QueryEngineTool(
        query_engine=june_engine,
        metadata=ToolMetadata(
            name="june_22",
            description="Provides information about Uber quarterly financials ending June 2022",
        ),
    ),
    QueryEngineTool(
        query_engine=march_engine,
        metadata=ToolMetadata(
            name="march_22",
            description="Provides information about Uber quarterly financials ending March 2022",
        ),
    ),
]

Then, define a SubQuestionQueryEngine over these tools:

from llama_index.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools
)

This query engine can execute any number of sub-queries against any subset of query engine tools before synthesizing the final answer. This makes it especially well-suited for compare/contrast queries across documents as well as queries pertaining to a specific document.

For complex queries, you can use a query transformation module within a ComposableGraph. This module will help break down a complex query into a simpler one over your existing index structure.

from llama_index.indices.query.query_transform.base import (
    DecomposeQueryTransform,
)

decompose_transform = DecomposeQueryTransform(
    service_context.llm, verbose=True
)

Please note that the SubQuestionQueryEngine has a token generation limit. If your query exceeds this limit, you may need to break it down into smaller sub-queries. Unfortunately, I wasn't able to find the exact token limit within the LlamaIndex repository. You might want to experiment with different query lengths to find the limit empirically, or check the documentation of the underlying LLM for any inherent token limits.

You can find more information about this in the LlamaIndex documentation.

Sources

docs/understanding/putting_it_all_together/q_and_a.md

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

nerdai commented 8 months ago

I haven't tried this before, but may be worth a shot to use a prompt compression technique before synthesizing the final answer.

LLMLingua exmaple

dosubot[bot] commented 5 months ago

Hi, @GxTeo,

I'm helping the LlamaIndex team manage our backlog and am marking this issue as stale. From what I understand, you are trying to implement multi-document RAG with a document summary index using the zephyr-7b-alpha LLM. You encountered a limitation with the token generation by the LLM when using the SubQuestionQueryEngine. In response, I provided guidance on using the SubQuestionQueryEngine class to generate sub-queries against sub-documents and suggested using a query transformation module within a ComposableGraph for complex queries. Additionally, nerdai recommended trying a prompt compression technique and shared an LLMLingua example.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and cooperation. If you have any further questions or need assistance, feel free to reach out.

run-llama / llama_index