[Question]: How does Agentic RAG judge if the question shall be answered via single RAG retrieval or multiple retrievals by agent?

tonyaw commented 6 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I want to use llama-index to build an Agentic RAG, like mentioned in https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/1/introduction. I want the system to query RAG once for most of simple questions. For complex question, I want the system to use agent to do multiple retrievals, and give a summary based on the answers. My question is how to judge if user's question shall be answered by a single RAG query or an agent to break down the query and do multiple retrievals?

I have two ideas here:

Add an additional LLM call at beginning to judge if the question can be answered by a) single RAG retrieval, or b) it is complex enough, and needs multiple retrievals. If LLM call answers "a)", send the question to RAG directly. If LLM call answers "b)", use agent to do multiple retrievals.
Always to send question to RAG first. Find a way to let LLM judge if the answer is comprehensive enough, if no, switch to agent to do multiple retrievals again.

Could you please provide your suggestion about how it shall be handled?

Question examples:

Simple examples that can be answered by single RAG query:
- Does dispatcher X support IPv6?
- What is statistic counter Y?
Complex examples that need agent to answer via multiple retrievals:
- Find all dispatcher types that support IPv6. (Need agent to list all dispatchers types first, and check RAG if each dispatcher support IPv6 or not.)
- Find all statistic counters that related to billing function. (Need agent to list all statistic counters first, and then check each counter is related to billing function.)

Thanks in advance!

dosubot[bot] commented 6 months ago

To determine if a user's question should be answered by a single RAG query or if it requires an agent to break down the query and perform multiple retrievals, you can follow these guidelines:

Single Document Query: If the question pertains to a specific aspect or summary of a single document, a single RAG query is sufficient. For example:
- "Does dispatcher X support IPv6?"
- "What is statistic counter Y?"
Multi-Document Query: If the question involves comparing or summarizing information across multiple documents, it requires an agent to break down the query and perform multiple retrievals. For example:
- "Find all dispatcher types that support IPv6."
- "Find all statistic counters that are related to the billing function."

For implementation, you can consider the following approaches:

Approach 1: Initial LLM Call to Classify Query Complexity

Initial LLM Call: Use an LLM to classify the query as either simple or complex.
Routing Based on Classification:
- If classified as simple, send the query to the RAG directly.
- If classified as complex, use an agent to break down the query and perform multiple retrievals.

Approach 2: Initial RAG Query with Post-Processing

Initial RAG Query: Always send the query to the RAG first.
Post-Processing:
- Let the LLM judge if the answer is comprehensive enough.
- If the answer is not comprehensive, switch to using an agent to perform multiple retrievals and provide a more detailed response.

Example Implementation

For simple queries:

response = self._client.search(
    collection_name=self.collection_name,
    query_vector=query_embedding,
    limit=query.similarity_top_k,
    query_filter=query_filter,
)

For complex queries:

sub_questions = sub_question_query_engine.generate_sub_questions(main_query)
sub_questions = prune_sub_questions(sub_questions)
answers = []
for sub_question in sub_questions:
    answer = sub_question_query_engine.retrieve_answer(sub_question)
    answers.append(answer)
final_answer = combine_answers(answers)

Additionally, consider using the Router Query Engine, which may be useful for routing queries to the appropriate engine based on their complexity ^[1].

By following these guidelines and approaches, you can effectively determine whether a user's question should be answered by a single RAG query or if it requires an agent to perform multiple retrievals.

To continue talking to Dosu, mention @dosu.

tonyaw commented 6 months ago

@jerryjliu, could you please help to check my question as you are the author of https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/1/introduction? :-)

tonyaw commented 6 months ago

Could you please help to provide some suggestion? Thanks!

run-llama / llama_index