run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

https://docs.llamaindex.ai

MIT License

36.9k stars 5.28k forks source link

[Question]: Router Query Engine Examples #14994

Open gmdl007 opened 3 months ago

gmdl007 commented 3 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

This config i find in the example for Router Query Engine, and the description is kind of the text which lead you to decide which tool to router route to. But in this context, what i have written below, does it make sense? I find somethings the tool was making a wrong decision, is it because my description is not extensive enough?

from llama_index.core.tools import QueryEngineTool

list_tool = QueryEngineTool.from_defaults( query_engine=list_query_engine, description=( "Useful for summarization task" ), )

vector_tool = QueryEngineTool.from_defaults( query_engine=vector_query_engine, description=( "Useful for retrieving specific information task" ), )

Original:

from llama_index.core.tools import QueryEngineTool

list_tool = QueryEngineTool.from_defaults( query_engine=list_query_engine, description=( "Useful for summarization questions related to Paul Graham eassy on" " What I Worked On." ), )

vector_tool = QueryEngineTool.from_defaults( query_engine=vector_query_engine, description=( "Useful for retrieving specific context from Paul Graham essay on What" " I Worked On." ), )

to be honest, i didnt understand why the text is seperated into 2 parts...

Cheers Guan

dosubot[bot] commented 3 months ago

Moreover, the LlamaIndex core tests indicate that there is a maximum length for descriptions (1024 characters for OpenAI tools). While it is important to be detailed, you should also ensure that your descriptions do not exceed this limit, as exceeding it can cause errors. If you have only one index, then you only need one tool with an appropriate description ^[4].

In your case, the original descriptions are more specific and provide context about the Paul Graham essay, which likely helps the Router Query Engine make better decisions. The revised descriptions are more generic and may not provide enough context for accurate routing.

Regarding the text being separated into two parts, this is likely for readability and to adhere to line length conventions in code. It does not affect the functionality.

To continue talking to Dosu, mention @dosu.

gmdl007 commented 3 months ago

Ok ic, thanks… For the descriptions, if I have multiple documents, they are discussing quite different topics, how would you create such descriptions? Do you have to make sure your index only related to a single document?

Best Guan

From: dosubot[bot] @.> Date: Saturday, July 27, 2024 at 9:49 AM To: run-llama/llama_index @.> Cc: gmdl007 @.>, Author @.> Subject: Re: [run-llama/llama_index] [Question]: Router Query Engine Examples (Issue #14994)

Yes, the issues you are experiencing with the tool making incorrect decisions could be due to your descriptions not being extensive enough. According to a message from Logan M, "Which tool it picks is dependant on the description alone." This implies that the accuracy of the tool's decision-making is heavily influenced by the quality and detail of the descriptions provided. Additionally, the LlamaIndex documentation highlights the importance of detailed descriptions, especially when dealing with complex query plans. If the descriptions are too brief or lack necessary details, the tool may not function as intended [1]https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/discover_llamaindex/document_management/discord_dumps/help_channel_dump_05_25_23.json[2]https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/discover_llamaindex/document_management/discord_dumps/help_channel_dump_06_02_23.json[3]https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/agent/openai_agent_lengthy_tools.ipynb[4]https://github.com/run-llama/llama_index/blob/main/llama-index-core/tests/tools/test_types.py.

Regarding the text being separated into two parts, this is likely for readability and to adhere to line length conventions in code. It does not affect the functionality.

To continue talking to Dosuhttps://dosu.dev, mention @dosuhttps://github.com/dosu.

— Reply to this email directly, view it on GitHubhttps://github.com/run-llama/llama_index/issues/14994#issuecomment-2253987991, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2FHXEVRF27BNTHVUC7X5DZONGJZAVCNFSM6AAAAABLRVDVFCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJTHE4DOOJZGE. You are receiving this because you authored the thread.Message ID: @.***>

dosubot[bot] commented 3 months ago

To ensure accurate routing by the Router Query Engine in LlamaIndex, you should create descriptions that effectively cover the specific tasks each query engine is designed to handle. Each index does not need to be related to a single document, but the descriptions should be clear and concise about the type of tasks they are suitable for.

Given the context of your current setup, here is how you can approach crafting these descriptions:

Summarization Task: This description should highlight that the tool is optimized for summarizing content from multiple documents.
Retrieving Specific Information Task: This description should emphasize that the tool is designed to fetch precise information from a collection of documents.

Here is an example of how you can balance detail and brevity in your descriptions:

from llama_index.core.tools import QueryEngineTool

list_tool = QueryEngineTool.from_defaults(
    query_engine=list_query_engine,
    description=(
        "Useful for summarizing content from multiple documents, providing concise overviews and key points."
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific information from a collection of documents, focusing on precise and detailed answers."
    ),
)

By clearly defining the purpose and strengths of each tool, you can ensure that the Router Query Engine routes queries accurately, even when dealing with multiple documents on different topics.

Additionally, if you find that your descriptions are exceeding the 1024 character limit imposed by OpenAI tools, consider moving the tool descriptions to the prompt. This approach allows you to keep the query plan description fixed in size while providing detailed tool descriptions alongside the query. Here’s an example of how to implement this:

from llama_index.core.tools.types import ToolMetadata

introductory_tool_description_prefix = """\
This is a query plan tool that takes in a list of tools and executes a \
query plan over these tools to answer a query. The query plan is a DAG of query nodes.

Given a list of tool names and the query plan schema, you \
can choose to generate a query plan to answer a question.

The tool names and descriptions will be given alongside the query.
"""

# Modify metadata to only include the general query plan instructions
new_metadata = ToolMetadata(
    introductory_tool_description_prefix,
    query_plan_tool.metadata.name,
    query_plan_tool.metadata.fn_schema,
)
query_plan_tool.metadata = new_metadata

# Reconstruct concatenated query engine tool descriptions
tools_description = "\n\n".join(
    [
        f"Tool Name: {tool.metadata.name}\n"
        + f"Tool Description: {tool.metadata.description} "
        for tool in query_engine_tools
    ]
)

# Concatenate tool descriptions and query
query_planned_query = f"{tools_description}\n\nQuery: {query}"

This method ensures that you stay within the character limit while still providing detailed descriptions for accurate query routing.

For example, in the provided document, the descriptions for the tools are specific to the content they handle:

list_tool = QueryEngineTool.from_defaults(
    query_engine=list_query_engine,
    description=(
        "Useful for summarization questions related to Paul Graham essay on"
        " What I Worked On."
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from Paul Graham essay on What"
        " I Worked On."
    ),
)

These descriptions are concise yet detailed enough to guide the Router Query Engine in selecting the appropriate tool for the query ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

gmdl007 commented 3 months ago

This is helpful, I have still some questions.

If I want the tool to summarize a particular document, do I just say “summarize this xyz document” by refering to the document name, or should I use the content inside the document so the tool can automatically find which document is relavent? How does the tool know which document that is to summarize?

Guan

From: dosubot[bot] @.> Date: Tuesday, July 30, 2024 at 12:00 PM To: run-llama/llama_index @.> Cc: gmdl007 @.>, Author @.> Subject: Re: [run-llama/llama_index] [Question]: Router Query Engine Examples (Issue #14994)

Given the context of your current setup, here is how you can approach crafting these descriptions:

Summarization Task: This description should highlight that the tool is optimized for summarizing content from multiple documents.
Retrieving Specific Information Task: This description should emphasize that the tool is designed to fetch precise information from a collection of documents.

Here is an example of how you can balance detail and brevity in your descriptions:

from llama_index.core.tools import QueryEngineTool

list_tool = QueryEngineTool.from_defaults(

query_engine=list_query_engine,

description=(

    "Useful for summarizing content from multiple documents, providing concise overviews and key points."

),

)

vector_tool = QueryEngineTool.from_defaults(

query_engine=vector_query_engine,

description=(

    "Useful for retrieving specific information from a collection of documents, focusing on precise and detailed answers."

),

)

By clearly defining the purpose and strengths of each tool, you can ensure that the Router Query Engine routes queries accurately, even when dealing with multiple documents on different topics.

from llama_index.core.tools.types import ToolMetadata

introductory_tool_description_prefix = """\

This is a query plan tool that takes in a list of tools and executes a \

query plan over these tools to answer a query. The query plan is a DAG of query nodes.

Given a list of tool names and the query plan schema, you \

can choose to generate a query plan to answer a question.

The tool names and descriptions will be given alongside the query.

"""

Modify metadata to only include the general query plan instructions

new_metadata = ToolMetadata(

introductory_tool_description_prefix,

query_plan_tool.metadata.name,

query_plan_tool.metadata.fn_schema,

)

query_plan_tool.metadata = new_metadata

Reconstruct concatenated query engine tool descriptions

tools_description = "\n\n".join(

[

    f"Tool Name: {tool.metadata.name}\n"

    + f"Tool Description: {tool.metadata.description} "

    for tool in query_engine_tools

]

)

Concatenate tool descriptions and query

query_planned_query = f"{tools_description}\n\nQuery: {query}"

This method ensures that you stay within the character limit while still providing detailed descriptions for accurate query routing.

For example, in the provided document, the descriptions for the tools are specific to the content they handle:

list_tool = QueryEngineTool.from_defaults(

query_engine=list_query_engine,

description=(

    "Useful for summarization questions related to Paul Graham essay on"

    " What I Worked On."

),

)

vector_tool = QueryEngineTool.from_defaults(

query_engine=vector_query_engine,

description=(

    "Useful for retrieving specific context from Paul Graham essay on What"

    " I Worked On."

),

)

These descriptions are concise yet detailed enough to guide the Router Query Engine in selecting the appropriate tool for the query [1]https://github.com/run-llama/llama_index/blob/main/llama-index-core/tests/tools/test_types.py[2]https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/agent/openai_agent_lengthy_tools.ipynb[3]https://github.com/run-llama/llama_index/blob/main/docs/docs/module_guides/querying/router/index.md[4]https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/query_engine/RouterQueryEngine.ipynb.

To continue talking to Dosuhttps://dosu.dev, mention @dosuhttps://github.com/dosu.

— Reply to this email directly, view it on GitHubhttps://github.com/run-llama/llama_index/issues/14994#issuecomment-2257963591, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2FHXG4H6FN7UM2SDQMOATZO5P4TAVCNFSM6AAAAABLRVDVFCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJXHE3DGNJZGE. You are receiving this because you authored the thread.Message ID: @.***>

dosubot[bot] commented 3 months ago

To instruct the Router Query Engine in LlamaIndex to summarize a specific document, you should refer to the document by its content rather than just its name. The RouterQueryEngine uses the content of the documents to build indices and perform summarization tasks.

Here is an example of how you can build the query engine and use it for summarization:

from llama_index.core import SimpleDirectoryReader, Settings, StorageContext, SummaryIndex, VectorStoreIndex
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import PydanticSingleSelector

# Load documents
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

# Initialize settings (set chunk size)
Settings.chunk_size = 1024
nodes = Settings.node_parser.get_nodes_from_documents(documents)

# Initialize storage context (by default it's in-memory)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

# Define Summary Index and Vector Index over the same data
summary_index = SummaryIndex(nodes, storage_context=storage_context)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

# Define Query Engines and Set Metadata
list_query_engine = summary_index.as_query_engine(response_mode="tree_summarize", use_async=True)
vector_query_engine = vector_index.as_query_engine()

list_tool = QueryEngineTool.from_defaults(
    query_engine=list_query_engine,
    description="Useful for summarization questions related to Paul Graham essay on What I Worked On."
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for retrieving specific context from Paul Graham essay on What I Worked On."
)

# Define Router Query Engine
query_engine = RouterQueryEngine(
    selector=PydanticSingleSelector.from_defaults(),
    query_engine_tools=[list_tool, vector_tool]
)

# Use the query engine to summarize the document
response = query_engine.query("What is the summary of the document?")
print(str(response))

In this example, the document is referred to by its content when creating the Document object. The RouterQueryEngine will then use this content to identify and summarize the relevant document ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

gmdl007 commented 3 months ago

If I want the tool to only summarize a particular document within a index, how can I do this deterministically?

Guan

From: dosubot[bot] @.> Date: Tuesday, July 30, 2024 at 12:24 PM To: run-llama/llama_index @.> Cc: gmdl007 @.>, Author @.> Subject: Re: [run-llama/llama_index] [Question]: Router Query Engine Examples (Issue #14994)

To instruct the Router Query Engine in LlamaIndex to summarize a specific document, you should refer to the document by its content rather than just its name. The RouterQueryEngine uses the content of the documents to build indices and perform summarization tasks.

Here is an example of how you can build the query engine and use it for summarization:

from llama_index.core import SimpleDirectoryReader, Settings, StorageContext, SummaryIndex, VectorStoreIndex

from llama_index.core.tools import QueryEngineTool

from llama_index.core.query_engine import RouterQueryEngine

from llama_index.core.selectors import PydanticSingleSelector

Load documents

documents = SimpleDirectoryReader("../data/paul_graham").load_data()

Initialize settings (set chunk size)

Settings.chunk_size = 1024

nodes = Settings.node_parser.get_nodes_from_documents(documents)

Initialize storage context (by default it's in-memory)

storage_context = StorageContext.from_defaults()

storage_context.docstore.add_documents(nodes)

Define Summary Index and Vector Index over the same data

summary_index = SummaryIndex(nodes, storage_context=storage_context)

vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

Define Query Engines and Set Metadata

list_query_engine = summary_index.as_query_engine(response_mode="tree_summarize", use_async=True)

vector_query_engine = vector_index.as_query_engine()

list_tool = QueryEngineTool.from_defaults(

query_engine=list_query_engine,

description="Useful for summarization questions related to Paul Graham essay on What I Worked On."

)

vector_tool = QueryEngineTool.from_defaults(

query_engine=vector_query_engine,

description="Useful for retrieving specific context from Paul Graham essay on What I Worked On."

)

Define Router Query Engine

query_engine = RouterQueryEngine(

selector=PydanticSingleSelector.from_defaults(),

query_engine_tools=[list_tool, vector_tool]

)

Use the query engine to summarize the document

response = query_engine.query("What is the summary of the document?")

print(str(response))

In this example, the document is referred to by its content when creating the Document object. The RouterQueryEngine will then use this content to identify and summarize the relevant document [1]https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/composability/joint_qa_summary.py[2]https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/composability/joint_qa_summary.py[3]https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/query_engine/RouterQueryEngine.ipynb[4]https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/document_summary/base.py.

To continue talking to Dosuhttps://dosu.dev, mention @dosuhttps://github.com/dosu.

— Reply to this email directly, view it on GitHubhttps://github.com/run-llama/llama_index/issues/14994#issuecomment-2258009656, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2FHXGGDK5CGVXVOCEUOF3ZO5SUVAVCNFSM6AAAAABLRVDVFCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJYGAYDSNRVGY. You are receiving this because you authored the thread.Message ID: @.***>

dosubot[bot] commented 3 months ago

To ensure that the Router Query Engine in LlamaIndex deterministically summarizes a specific document within an index, you should refer to the document by its ID. The get_document_summary method in the DocumentSummaryIndex class retrieves the summary by the document ID.

Here is the relevant code snippet:

def get_document_summary(self, doc_id: str) -> str:
    """Get document summary by doc id.

    Args:
        doc_id (str): A document id.
    """
    if doc_id not in self._index_struct.doc_id_to_summary_id:
        raise ValueError(f"doc_id {doc_id} not in index")
    summary_id = self._index_struct.doc_id_to_summary_id[doc_id]
    return self.docstore.get_node(summary_id).get_content()

You should refer to the document by its ID (doc_id) to ensure deterministic summarization. Additionally, the QASummaryQueryEngineBuilder class in the llama_index.core.composability.joint_qa_summary module provides a mechanism to build a unified query interface for both QA and summarization tasks, which can further ensure the deterministic summarization of documents by their IDs. This class allows you to build indices and query engines that handle both summarization and QA tasks effectively.

Here is an example of how to use the QASummaryQueryEngineBuilder:

from llama_index.core.composability.joint_qa_summary import QASummaryQueryEngineBuilder
from llama_index.core.schema import Document

# Initialize the builder
builder = QASummaryQueryEngineBuilder()

# Build the query engine from documents
documents = [Document(content="Your document content here", id="doc1")]
query_engine = builder.build_from_documents(documents)

# Use the query engine to summarize the document by its ID
summary = query_engine.get_document_summary("doc1")
print(summary)

By using the document ID, you ensure that the summarization process is deterministic and specific to the document you are interested in ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

dosubot[bot] commented 3 months ago

To ensure that the Router Query Engine in LlamaIndex deterministically summarizes a specific document within an index, you can use the RouterQueryEngine class with a summarizer. The RouterQueryEngine can be configured to use a TreeSummarize summarizer to combine responses from multiple query engines. Here is an example of how you can achieve this:

from llama_index.legacy.schema import Document, QueryBundle
from llama_index.legacy.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.legacy.prompts.default_prompt_selectors import DEFAULT_TREE_SUMMARIZE_PROMPT_SEL
from llama_index.legacy.response_synthesizers import TreeSummarize
from llama_index.legacy.core.llms.llm import LLM
from llama_index.legacy.selectors.utils import get_selector_from_context
from llama_index.legacy.tools.query_engine import QueryEngineTool
from llama_index.legacy.service_context import ServiceContext

# Create your documents
documents = [Document(text="Your document text here")]

# Initialize the LLM and summarizer
llm = LLM()  # Initialize your LLM here
service_context = ServiceContext.from_defaults()
summarizer = TreeSummarize(service_context=service_context, summary_template=DEFAULT_TREE_SUMMARIZE_PROMPT_SEL)

# Create a selector
selector = get_selector_from_context(service_context)

# Create query engine tools (assuming you have a list of query engines)
query_engines = []  # Initialize your list of query engines here
query_engine_tools = [QueryEngineTool(query_engine=qe) for qe in query_engines]

# Initialize the RouterQueryEngine
router_query_engine = RouterQueryEngine(
    selector=selector,
    query_engine_tools=query_engine_tools,
    service_context=service_context,
    summarizer=summarizer
)

# Now you can use the router_query_engine to summarize the document
query_bundle = QueryBundle(query_str="Summarize the document")
summary = router_query_engine.query(query_bundle)
print(summary)

This code initializes the RouterQueryEngine with a TreeSummarize summarizer, builds the RouterQueryEngine with the provided documents and query engines, and then queries the engine to summarize the document. The TreeSummarize class handles the summarization of sub-results, ensuring that the appropriate summarization is performed for your query ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

gmdl007 commented 3 months ago

Here is my complete code:

summary_index = SummaryIndex(nodes, storage_context=storage_context) vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

list_query_engine = summary_index.as_query_engine( response_mode="tree_summarize", similarity_top_k=4, use_async=True ) vector_query_engine = vector_index.as_query_engine(similarity_top_k=4)

from llama_index.core.tools import QueryEngineTool

list_tool = QueryEngineTool.from_defaults( query_engine=list_query_engine, description=( "Useful for summarizing content from multiple documents, providing concise overviews and key points." ), )

vector_tool = QueryEngineTool.from_defaults( query_engine=vector_query_engine, description=( "Useful for retrieving specific information from a collection of documents, focusing on precise and detailed answers." ), )

from llama_index.core.query_engine import RouterQueryEngine from llama_index.core.selectors import LLMSingleSelector, LLMMultiSelector from llama_index.core.selectors import ( PydanticMultiSelector, PydanticSingleSelector, )

query_engine = RouterQueryEngine( selector=LLMSingleSelector.from_defaults(), query_engine_tools=[ list_tool, vector_tool, ], )

response = query_engine.query("can you summarize ipv6 related document?") print(str(response))

I noticed however, the query vector_query_engine is much slower than before, is this related to the router query engine?

Guan

From: dosubot[bot] @.> Date: Tuesday, July 30, 2024 at 12:55 PM To: run-llama/llama_index @.> Cc: gmdl007 @.>, Author @.> Subject: Re: [run-llama/llama_index] [Question]: Router Query Engine Examples (Issue #14994)

To ensure that the Router Query Engine in LlamaIndex deterministically summarizes a specific document within an index, you can use the RouterQueryEngine class with a summarizer. The RouterQueryEngine can be configured to use a TreeSummarize summarizer to combine responses from multiple query engines. Here is an example of how you can achieve this:

from llama_index.legacy.schema import Document, QueryBundle

from llama_index.legacy.core.query_engine.router_query_engine import RouterQueryEngine

from llama_index.legacy.prompts.default_prompt_selectors import DEFAULT_TREE_SUMMARIZE_PROMPT_SEL

from llama_index.legacy.response_synthesizers import TreeSummarize

from llama_index.legacy.core.llms.llm import LLM

from llama_index.legacy.selectors.utils import get_selector_from_context

from llama_index.legacy.tools.query_engine import QueryEngineTool

from llama_index.legacy.service_context import ServiceContext

Create your documents

documents = [Document(text="Your document text here")]

Initialize the LLM and summarizer

llm = LLM() # Initialize your LLM here

service_context = ServiceContext.from_defaults()

summarizer = TreeSummarize(service_context=service_context, summary_template=DEFAULT_TREE_SUMMARIZE_PROMPT_SEL)

Create a selector

selector = get_selector_from_context(service_context)

Create query engine tools (assuming you have a list of query engines)

query_engines = [] # Initialize your list of query engines here

query_engine_tools = [QueryEngineTool(query_engine=qe) for qe in query_engines]

Initialize the RouterQueryEngine

router_query_engine = RouterQueryEngine(

selector=selector,

query_engine_tools=query_engine_tools,

service_context=service_context,

summarizer=summarizer

)

Now you can use the router_query_engine to summarize the document

query_bundle = QueryBundle(query_str="Summarize the document")

summary = router_query_engine.query(query_bundle)

print(summary)

This code initializes the RouterQueryEngine with a TreeSummarize summarizer, builds the RouterQueryEngine with the provided documents and query engines, and then queries the engine to summarize the document. The TreeSummarize class handles the summarization of sub-results, ensuring that the appropriate summarization is performed for your query [1]https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/composability/joint_qa_summary.py[2]https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/composability/joint_qa_summary.py[3]https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/router_query_engine.py[4]https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/query_engine/router_query_engine.py.

To continue talking to Dosuhttps://dosu.dev, mention @dosuhttps://github.com/dosu.

— Reply to this email directly, view it on GitHubhttps://github.com/run-llama/llama_index/issues/14994#issuecomment-2258065263, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2FHXATP5OITKZVPME65PDZO5WIFAVCNFSM6AAAAABLRVDVFCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJYGA3DKMRWGM. You are receiving this because you authored the thread.Message ID: @.***>

dosubot[bot] commented 3 months ago

Yes, the vector_query_engine being slower can be related to the Router Query Engine. The RouterQueryEngine uses different query engines for different tasks, and if the vector_query_engine is slow, it will affect the overall performance of the Router Query Engine.

To optimize the performance while still using the Router Query Engine for summarization tasks, you can consider the following approaches:

Optimize the Vector Store Index: Ensure that the vector store index is optimized for fast retrieval. This might involve tuning the underlying vector store parameters or using a more efficient vector storage backend.
Efficient Node Parsing and Ingestion: Make sure that the node parsing and ingestion process is efficient. This can be done by optimizing the transformations applied to the documents and ensuring that the storage context is efficiently handling the nodes.
Selective Query Engine Usage: If summarization tasks are the primary focus, ensure that the Router Query Engine is configured to prioritize the summarization query engine (list_query_engine) over the vector query engine when appropriate.

Here is a code snippet that shows how the RouterQueryEngine is built and how you might focus on optimizing the summarization tasks:

from llama_index.core import SummaryIndex, VectorStoreIndex, RouterQueryEngine, QueryEngineTool
from llama_index.llms.openai import OpenAI
from llama_index.core.query_engine import CustomQueryEngine, BaseQueryEngine
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core import PromptTemplate
from pydantic import Field

def build_from_documents(documents):
    """Build query engine."""
    # parse nodes
    nodes = Settings.node_parser.get_nodes_from_documents(documents)

    # ingest nodes
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    # build indices
    vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
    summary_index = SummaryIndex(nodes, storage_context=storage_context)

    vector_query_engine = vector_index.as_query_engine()
    list_query_engine = summary_index.as_query_engine(response_mode="tree_summarize", use_async=True)

    # build query engine tools
    list_tool = QueryEngineTool.from_defaults(
        query_engine=list_query_engine,
        description="Useful for summarization questions."
    )
    vector_tool = QueryEngineTool.from_defaults(
        query_engine=vector_query_engine,
        description="Useful for retrieving specific context."
    )

    # build query engine
    return RouterQueryEngine(
        selector=PydanticSingleSelector.from_defaults(),
        query_engine_tools=[list_tool, vector_tool]
    )

# Example usage
documents = SimpleDirectoryReader("../data/paul_graham").load_data()
query_engine = build_from_documents(documents)
response = query_engine.query("What is the summary of the document?")
print(str(response))

Additionally, you can build a custom router to dynamically decide whether to perform question-answering or summarization based on the query. Here is an example of how to set up a custom router:

class RouterQueryEngine(CustomQueryEngine):
    """Use our Pydantic program to perform routing."""

    query_engines: List[BaseQueryEngine]
    choice_descriptions: List[str]
    verbose: bool = False
    router_prompt: PromptTemplate
    llm: OpenAI
    summarizer: TreeSummarize = Field(default_factory=TreeSummarize)

    def custom_query(self, query_str: str):
        """Define custom query."""

        program = OpenAIPydanticProgram.from_defaults(
            output_cls=Answers,
            prompt=router_prompt1,
            verbose=self.verbose,
            llm=self.llm,
        )

        choices_str = get_choice_str(self.choice_descriptions)
        output = program(context_list=choices_str, query_str=query_str)
        # print choice and reason, and query the underlying engine
        if self.verbose:
            print(f"Selected choice(s):")
            for answer in output.answers:
                print(f"Choice: {answer.choice}, Reason: {answer.reason}")

        responses = []
        for answer in output.answers:
            choice_idx = answer.choice - 1
            query_engine = self.query_engines[choice_idx]
            response = query_engine.query(query_str)
            responses.append(response)

        # if a single choice is picked, we can just return that response
        if len(responses) == 1:
            return responses[0]
        else:
            # if multiple choices are picked, we can pick a summarizer
            response_strs = [str(r) for r in responses]
            result_response = self.summarizer.get_response(
                query_str, response_strs
            )
            return result_response

By focusing on these areas, you can improve the performance of the Router Query Engine while still effectively handling summarization tasks. Additionally, you can refer to the Retriever Router Query Engine tutorial for more advanced configurations and optimizations ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.