run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.17k stars 4.63k forks source link

[Question]: How does Agentic RAG judge if the question shall be answered via single RAG retrieval or multiple retrievals by agent? #13832

Open tonyaw opened 1 month ago

tonyaw commented 1 month ago

Question Validation

Question

I want to use llama-index to build an Agentic RAG, like mentioned in https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/1/introduction. I want the system to query RAG once for most of simple questions. For complex question, I want the system to use agent to do multiple retrievals, and give a summary based on the answers. My question is how to judge if user's question shall be answered by a single RAG query or an agent to break down the query and do multiple retrievals?

I have two ideas here:

  1. Add an additional LLM call at beginning to judge if the question can be answered by a) single RAG retrieval, or b) it is complex enough, and needs multiple retrievals. If LLM call answers "a)", send the question to RAG directly. If LLM call answers "b)", use agent to do multiple retrievals.
  2. Always to send question to RAG first. Find a way to let LLM judge if the answer is comprehensive enough, if no, switch to agent to do multiple retrievals again.

Could you please provide your suggestion about how it shall be handled?

Question examples:

Thanks in advance!

dosubot[bot] commented 1 month ago

To determine if a user's question should be answered by a single RAG query or if it requires an agent to break down the query and perform multiple retrievals, you can follow these guidelines:

  1. Single Document Query: If the question pertains to a specific aspect or summary of a single document, a single RAG query is sufficient. For example:

    • "Does dispatcher X support IPv6?"
    • "What is statistic counter Y?"
  2. Multi-Document Query: If the question involves comparing or summarizing information across multiple documents, it requires an agent to break down the query and perform multiple retrievals. For example:

    • "Find all dispatcher types that support IPv6."
    • "Find all statistic counters that are related to the billing function."

For implementation, you can consider the following approaches:

Approach 1: Initial LLM Call to Classify Query Complexity

  1. Initial LLM Call: Use an LLM to classify the query as either simple or complex.
  2. Routing Based on Classification:
    • If classified as simple, send the query to the RAG directly.
    • If classified as complex, use an agent to break down the query and perform multiple retrievals.

Approach 2: Initial RAG Query with Post-Processing

  1. Initial RAG Query: Always send the query to the RAG first.
  2. Post-Processing:
    • Let the LLM judge if the answer is comprehensive enough.
    • If the answer is not comprehensive, switch to using an agent to perform multiple retrievals and provide a more detailed response.

Example Implementation

For simple queries:

response = self._client.search(
    collection_name=self.collection_name,
    query_vector=query_embedding,
    limit=query.similarity_top_k,
    query_filter=query_filter,
)

For complex queries:

sub_questions = sub_question_query_engine.generate_sub_questions(main_query)
sub_questions = prune_sub_questions(sub_questions)
answers = []
for sub_question in sub_questions:
    answer = sub_question_query_engine.retrieve_answer(sub_question)
    answers.append(answer)
final_answer = combine_answers(answers)

Additionally, consider using the Router Query Engine, which may be useful for routing queries to the appropriate engine based on their complexity [1].

By following these guidelines and approaches, you can effectively determine whether a user's question should be answered by a single RAG query or if it requires an agent to perform multiple retrievals.

To continue talking to Dosu, mention @dosu.

tonyaw commented 1 month ago

@jerryjliu, could you please help to check my question as you are the author of https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/1/introduction? :-)

tonyaw commented 4 weeks ago

Could you please help to provide some suggestion? Thanks!