Closed GxTeo closed 5 months ago
🤖
Hey @GxTeo! Great to see you diving into another challenging project. How's everything going on your end?
To implement Multi-Document RAG with Document Summary Index using the zephyr-7b-alpha
LLM, you can use the SubQuestionQueryEngine
class. This class generates a "query plan" containing sub-queries against sub-documents before synthesizing the final answer.
First, define an index for each document/data source, and wrap it with a QueryEngineTool
:
from llama_index.tools import QueryEngineTool, ToolMetadata
query_engine_tools = [
QueryEngineTool(
query_engine=sept_engine,
metadata=ToolMetadata(
name="sept_22",
description="Provides information about Uber quarterly financials ending September 2022",
),
),
QueryEngineTool(
query_engine=june_engine,
metadata=ToolMetadata(
name="june_22",
description="Provides information about Uber quarterly financials ending June 2022",
),
),
QueryEngineTool(
query_engine=march_engine,
metadata=ToolMetadata(
name="march_22",
description="Provides information about Uber quarterly financials ending March 2022",
),
),
]
Then, define a SubQuestionQueryEngine
over these tools:
from llama_index.query_engine import SubQuestionQueryEngine
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=query_engine_tools
)
This query engine can execute any number of sub-queries against any subset of query engine tools before synthesizing the final answer. This makes it especially well-suited for compare/contrast queries across documents as well as queries pertaining to a specific document.
For complex queries, you can use a query transformation module within a ComposableGraph. This module will help break down a complex query into a simpler one over your existing index structure.
from llama_index.indices.query.query_transform.base import (
DecomposeQueryTransform,
)
decompose_transform = DecomposeQueryTransform(
service_context.llm, verbose=True
)
Please note that the SubQuestionQueryEngine
has a token generation limit. If your query exceeds this limit, you may need to break it down into smaller sub-queries. Unfortunately, I wasn't able to find the exact token limit within the LlamaIndex repository. You might want to experiment with different query lengths to find the limit empirically, or check the documentation of the underlying LLM for any inherent token limits.
You can find more information about this in the LlamaIndex documentation.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
I haven't tried this before, but may be worth a shot to use a prompt compression technique before synthesizing the final answer.
Hi, @GxTeo,
I'm helping the LlamaIndex team manage our backlog and am marking this issue as stale. From what I understand, you are trying to implement multi-document RAG with a document summary index using the zephyr-7b-alpha
LLM. You encountered a limitation with the token generation by the LLM when using the SubQuestionQueryEngine
. In response, I provided guidance on using the SubQuestionQueryEngine
class to generate sub-queries against sub-documents and suggested using a query transformation module within a ComposableGraph for complex queries. Additionally, nerdai recommended trying a prompt compression technique and shared an LLMLingua example.
Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.
Thank you for your understanding and cooperation. If you have any further questions or need assistance, feel free to reach out.
Question Validation
Question
Using open source LLMs
zephyr-7b-alpha
, how can I do multiple document RAG especially where in my use case, there are complex queries, which may require me to decompose the query and query multiple documents and then synthesise the final response.Tried
SubQuestionQueryEngine
, however I am limited by the number of tokens that can be generated by the LLM.