run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.25k stars 4.65k forks source link

[Question]: How to create a querypipeline to chat with text docs and sql tables. #13775

Open JonOnEarth opened 1 month ago

JonOnEarth commented 1 month ago

Question Validation

Question

I have followed these two advanced examples [1, 2] to successfully chat with SQL tables, and text docs separately. How can I stitch them together? chat with multiple SQL tables and text docs at the same time. The process should be to choose the correct docs based on the query, and then chat with the chosen doc or table. If it's the table, then need the text-to-sql component as well. Besides, if I ask irrelevant queries with these documents, how can just use the LLM's reply instead of always searching the documents?

logan-markewich commented 1 month ago

Basically add some kind of router to route between the paths. Query pipelines support conditional links

JonOnEarth commented 1 month ago

Thanks @logan-markewich. I am new to LlamaIndex, and still have lots of confusion, just try to learn from the examples and real problems. Here is what I did and an error in the end.

I combined the two examples together as the basic queryPipeline like this: 1st example for SQL tables:

qp = QP(
    modules={
        "input": InputComponent(),
        "table_retriever": obj_retriever,
        "table_output_parser": table_parser_component,
        "text2sql_prompt": text2sql_prompt,
        "text2sql_llm": llm,
        "sql_output_parser": sql_parser_component,
        "sql_retriever": sql_retriever,
        "response_synthesis_prompt": response_synthesis_prompt,
        "response_synthesis_llm": llm,
    },
    verbose=True,
)
qp.add_chain(["input", "table_retriever", "table_output_parser"])
qp.add_link("input", "text2sql_prompt", dest_key="query_str")
qp.add_link("table_output_parser", "text2sql_prompt", dest_key="schema")
qp.add_chain(
    ["text2sql_prompt", "text2sql_llm", "sql_output_parser", "sql_retriever"]
)
qp.add_link(
    "sql_output_parser", "response_synthesis_prompt", dest_key="sql_query"
)
qp.add_link(
    "sql_retriever", "response_synthesis_prompt", dest_key="context_str"
)
qp.add_link("input", "response_synthesis_prompt", dest_key="query_str")
qp.add_link("response_synthesis_prompt", "response_synthesis_llm")

2nd example for text RAG:

p = QueryPipeline(verbose=True)
p.add_modules(
    {
        "input": InputComponent(),
        "retriever": retriever,
        "summarizer": summarizer,
    }
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")

I add a 3rd querypipeline just use the llm:

qp_llm = QueryPipeline(
    modules={
        "llm": llm2,
    },
    verbose=True,
)

Then I add the router as:

# define selector
selector = LLMSingleSelector.from_defaults()
choices = [
    "This tool answers questions related to wiki tables data",
    "This tool contains the knowledge about Paul Graham",
    "This tool only uses LLM itself to answer the questions non-related to Paul Graham and wiki tables data. "
]
router_c = RouterComponent(
    selector=selector,
    choices=choices,
    components=[p, qp_llm], #qp
    verbose=True,
)
# top-level pipeline
qp_t = QueryPipeline(chain=[router_c], verbose=True)

It returns the error:

[/usr/local/lib/python3.10/dist-packages/llama_index/core/query_pipeline/components/router.py](https://localhost:8080/#) in __init__(self, selector, choices, components, verbose)
    109             # validate component has one input key
    110             if len(new_component.free_req_input_keys) != 1:
--> 111                 raise ValueError("Expected one required input key")
    112             query_keys.append(next(iter(new_component.free_req_input_keys)))
    113             new_components.append(new_component)

ValueError: Expected one required input key
lazyFrogLOL commented 2 days ago

Thanks @logan-markewich. I am new to LlamaIndex, and still have lots of confusion, just try to learn from the examples and real problems. Here is what I did and an error in the end.

I combined the two examples together as the basic queryPipeline like this: 1st example for SQL tables:

qp = QP(
    modules={
        "input": InputComponent(),
        "table_retriever": obj_retriever,
        "table_output_parser": table_parser_component,
        "text2sql_prompt": text2sql_prompt,
        "text2sql_llm": llm,
        "sql_output_parser": sql_parser_component,
        "sql_retriever": sql_retriever,
        "response_synthesis_prompt": response_synthesis_prompt,
        "response_synthesis_llm": llm,
    },
    verbose=True,
)
qp.add_chain(["input", "table_retriever", "table_output_parser"])
qp.add_link("input", "text2sql_prompt", dest_key="query_str")
qp.add_link("table_output_parser", "text2sql_prompt", dest_key="schema")
qp.add_chain(
    ["text2sql_prompt", "text2sql_llm", "sql_output_parser", "sql_retriever"]
)
qp.add_link(
    "sql_output_parser", "response_synthesis_prompt", dest_key="sql_query"
)
qp.add_link(
    "sql_retriever", "response_synthesis_prompt", dest_key="context_str"
)
qp.add_link("input", "response_synthesis_prompt", dest_key="query_str")
qp.add_link("response_synthesis_prompt", "response_synthesis_llm")

2nd example for text RAG:

p = QueryPipeline(verbose=True)
p.add_modules(
    {
        "input": InputComponent(),
        "retriever": retriever,
        "summarizer": summarizer,
    }
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")

I add a 3rd querypipeline just use the llm:

qp_llm = QueryPipeline(
    modules={
        "llm": llm2,
    },
    verbose=True,
)

Then I add the router as:

# define selector
selector = LLMSingleSelector.from_defaults()
choices = [
    "This tool answers questions related to wiki tables data",
    "This tool contains the knowledge about Paul Graham",
    "This tool only uses LLM itself to answer the questions non-related to Paul Graham and wiki tables data. "
]
router_c = RouterComponent(
    selector=selector,
    choices=choices,
    components=[p, qp_llm], #qp
    verbose=True,
)
# top-level pipeline
qp_t = QueryPipeline(chain=[router_c], verbose=True)

It returns the error:

[/usr/local/lib/python3.10/dist-packages/llama_index/core/query_pipeline/components/router.py](https://localhost:8080/#) in __init__(self, selector, choices, components, verbose)
    109             # validate component has one input key
    110             if len(new_component.free_req_input_keys) != 1:
--> 111                 raise ValueError("Expected one required input key")
    112             query_keys.append(next(iter(new_component.free_req_input_keys)))
    113             new_components.append(new_component)

ValueError: Expected one required input key

Any solution for this issue? I meet it as well. How to define a free_req_input_key for QueryPipeline modules?

lazyFrogLOL commented 2 days ago

@JonOnEarth I solved the issue by replacing InputComponent() with a specific PromptTemplate object.

prompt_str = "{query}"
prompt_tmpl = PromptTemplate(prompt_str)
p = QueryPipeline(verbose=True)
p.add_modules(
    {
        "input": prompt_tmpl,
        "retriever": retriever,
        "summarizer": summarizer,
    }
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")

You can try using this code.

JonOnEarth commented 2 days ago

@JonOnEarth I solved the issue by replacing InputComponent() with a specific PromptTemplate object.

prompt_str = "{query}"
prompt_tmpl = PromptTemplate(prompt_str)
p = QueryPipeline(verbose=True)
p.add_modules(
    {
        "input": prompt_tmpl,
        "retriever": retriever,
        "summarizer": summarizer,
    }
)
p.add_link("input", "retriever")
p.add_link("input", "summarizer", dest_key="query_str")
p.add_link("retriever", "summarizer", dest_key="nodes")

You can try using this code.

@lazyFrogLOL Thanks for the solution.