Is implementation of Part 7: Decomposition really the implementation of Least-To-Most prompting method?

Hi @rlancemartin,

I have a question about the implementation of Part 7 where you are referring to the paper about Least-To-Most Prompting from Google.

You mentioned that this method can be implemented by processing all generated queries independently, but in the paper the authors wrote about sequential solving the next questions based on the already solved question-answer pairs on the previous stage (Figure 1 from the paper). Based on that it seems that this method from Google can't be parallelized in its original version because each subsequent step depends on all the previous sub-questions and answers.

At 2:00 you mentioned that we can answer sub-questions in isolation, but at 2:20 you told that previous solution will be injected in the context of the next sub-question, but further implementation does not include the previous Q&A pairs to the next question's context. Do these statements contradict each other?

Do I understand correctly that your implementation is more similar to multi-query approach but instead of all unique documents for all sub-questions you generate Q&A pairs based on that documents which you are using as the context for the final answer, and other logic is exactly the same like in multi-query, so it can be parallelized (like in multi-query section) instead of subsequent processing using for loop?

Also as I understand the core part for both staged of Least-To-Most prompting technique are the examples, and other techniques are optional for it (from the caption of Figure 1 and Section 2 of the paper):

Figure 1: Least-to-most prompting solving a math word problem in two stages: (1) query the language model to decompose the problem into subproblems; (2) query the language model to sequentially solve the subproblems. The answer to the second subproblem is built on the answer to the first

subproblem. The demonstration examples for each stage’s prompt are omitted in this illustration.

Least-to-most prompting can be combined with other prompting techniques like chain-of-thought (Wei et al., 2022) and self-consistency (Wang et al., 2022b), but does not need to be. Also, for some tasks, the two stages in least-to-most prompting can be merged to form a single-pass prompt.

Also I found the diagram of the implementation of Multi-Query RAG in README of the LCEL Teacher repository (top part of the screenshot) but as I understand this diagram describes exactly what you implemented in Part 7 here.

Could you please share your thoughts about that? Because I am little confused with this terminology.

Thank you.

Yes, you are correct!

Decomposition in general means answering each sub-question.

We can either do them in parallel, as shown.

Or recursively, as shown in least-to-most.

My code example shows parallel.

Do you think that showing the recursive approach is of high interest? If so, I will add it!

Thank you for your response!

In my opinion, probably it would be great to implement this approach which is close to least-to-most prompting because when authors compared CoT with their method, they mentioned that CoT solves sub-questions independently, and also the dependent nature of their approach improves result (as I understand).

But I don't know if it is possible to automatically prepare representative examples as described in the paper:

Decomposition. The prompt in this stage contains constant examples that demonstrate the decomposition, followed by the specific question to be decomposed.

Subproblem solving. The prompt in this stage consists of three parts: (1) constant examples demonstrating how subproblems are solved; (2) a potentially empty list of previously answered sub-questions and generated solutions, and (3) the question to be answered next.

As I understand, such prompt tuning with optimal examples selection and generation are solved by frameworks like DSPy?

By the way how do you think is the following method viable (or probably it already exists) - what if we generate sub-questions and either calculate some aggregated embedding for them (for example, mean) or group sub-questions by embeddings using similarity threshold, after that retrieve the documents for each aggregated embedding? To avoid sub-questions that are similar and decrease overall number of tokens used?

Thank you.

Thank you for your response!

In my opinion, probably it would be great to implement this approach which is close to least-to-most prompting because when authors compared CoT with their method, they mentioned that CoT solves sub-questions independently, and also the dependent nature of their approach improves result (as I understand).

But I don't know if it is possible to automatically prepare representative examples as described in the paper:

Decomposition. The prompt in this stage contains constant examples that demonstrate the decomposition, followed by the specific question to be decomposed.

Subproblem solving. The prompt in this stage consists of three parts: (1) constant examples demonstrating how subproblems are solved; (2) a potentially empty list of previously answered sub-questions and generated solutions, and (3) the question to be answered next.

As I understand, such prompt tuning with optimal examples selection and generation are solved by frameworks like DSPy?

By the way how do you think is the following method viable (or probably it already exists) - what if we generate sub-questions and either calculate some aggregated embedding for them (for example, mean) or group sub-questions by embeddings using similarity threshold, after that retrieve the documents for each aggregated embedding? To avoid sub-questions that are similar and decrease overall number of tokens used?

Thank you.

Good feedback!

I just added it :)

https://github.com/langchain-ai/rag-from-scratch/pull/5

Please keep the helpful feedback coming!

Also note: the approach in the notebook is also similar IR-CoT:

https://arxiv.org/pdf/2212.10509.pdf

IR-CoT interleaves retrieval with CoT:

But, it does not pre-generate a set of sub-questions up front.

The example I show in the notebook combines ideas from IR-CoT and Least-to-Most:

First decompose the question into N sub-questions like Least-to-Most.
For the first sub-question, perform retrieval and answer it
Then, for the second sub-question, perform retrieval and append prior Q+A pair to answer it
... and so forth, just like IR-CoT interleaves retrieval with reasoning.

I find this to be fairly easy to follow.

But open to suggestions!

Hi @rlancemartin,

Thank you so much for your implementation and explanation, it is very useful! I will check updated code and mentioned paper soon.

Thank you.

I've got other question is example from the source a good example for decomposition?

We've got the question: "What are the main components of an LLM-powered autonomous agent system?"

And 3 answers based on RAG DECOMPOSITION:

Question: 1. What is LLM technology and how does it work in autonomous agent systems? Answer: LLM stands for Large Language Model, which serves as the core controller in autonomous agent systems. In these systems, LLM functions as the agent's brain and is complemented by key components such as planning, subgoal decomposition, reflection, and refinement. LLM technology allows the agent to break down large tasks into smaller subgoals, enabling efficient handling of complex tasks. Additionally, the agent can engage in self-criticism, self-reflection, and learning from past actions to improve future decision-making processes. Another approach, LLM+P, involves using an external classical planner for long-horizon planning by translating the problem into Planning Domain Definition Language (PDDL) and generating plans based on domain-specific PDDL. Self-reflection is also crucial in autonomous agents, allowing them to iteratively improve by refining past actions and correcting mistakes. ReAct is an example of integrating reasoning and acting within LLM, extending the action space to include task-specific actions and language-based reasoning.

Question: 2. What are the specific components that make up an LLM-powered autonomous agent system? Answer: The specific components that make up an LLM-powered autonomous agent system include:

Planning: This involves breaking down large tasks into smaller, manageable subgoals to enable efficient handling of complex tasks. Task decomposition techniques like Chain of Thought (CoT) and Tree of Thoughts are used to guide the agent in planning ahead and executing tasks step by step.
Reflection and Refinement: The agent can engage in self-criticism and self-reflection over past actions, learn from mistakes, and refine its approach for future steps. This iterative process helps improve the quality of the final results produced by the agent.

These components work in conjunction with the LLM technology, which serves as the core controller in autonomous agent systems, to enable effective decision-making and problem-solving capabilities.

Question: 3. How do the main components of an LLM-powered autonomous agent system interact with each other to enable autonomous behavior? Answer: The main components of an LLM-powered autonomous agent system interact with each other in a coordinated manner to enable autonomous behavior.

Planning: The agent utilizes planning to break down large tasks into smaller, manageable subgoals. Techniques like Chain of Thought (CoT) and Tree of Thoughts help guide the agent in planning ahead and executing tasks step by step. This component ensures that the agent has a clear roadmap for achieving its objectives.
Reflection and Refinement: The agent engages in self-criticism and self-reflection over past actions, learning from mistakes, and refining its approach for future steps. This iterative process allows the agent to improve the quality of its final results by correcting errors and adjusting its strategies based on past experiences.

These components work in conjunction with the LLM technology, which serves as the core controller in the autonomous agent system. The planning component provides a structured approach to task execution, while reflection and refinement enable the agent to adapt and improve over time. Together, these components enable the autonomous agent to make effective decisions, solve complex problems, and exhibit autonomous behavior.

And according to the code, answer for Question 3 is the final answer

for q in questions:

    rag_chain = (
    {"context": itemgetter("question") | retriever, 
     "question": itemgetter("question"),
     "q_a_pairs": itemgetter("q_a_pairs")} 
    | decomposition_prompt
    | llm
    | StrOutputParser())

    answer = rag_chain.invoke({"question":q,"q_a_pairs":q_a_pairs})
    q_a_pair = format_qa_pair(q,answer)
    q_a_pairs = q_a_pairs + "\n---\n"+  q_a_pair

Is this the answer for the original question? "What are the main components of an LLM-powered autonomous agent system?"

FOR ME NOT, it is the answer for 3rd question from the decomposition of the question - not the original one.

I tested the question from the paper:

https://arxiv.org/pdf/2205.10625

And the example makes more sense. Because last sentence is a summary question.

It takes Amy 4 minutes to climb the top of a slide. It takes 1 minute to slide down. The water slide closes in 15 minutes. How many times can she slide before it closes

STEP 1. Decompose the question into 3 questions ================================

1. How many minutes does it take for Amy to complete one round trip on the slide (climbing up and sliding down)?
2. How many round trips can Amy complete in 15 minutes?
3. How many minutes does Amy spend sliding down in total if she slides multiple times before the water slide closes?

STEP 2. Go trhough question and aske them recursively ================================

 == Answer for question:  1. How many minutes does it take for Amy to complete one round trip on the slide (climbing up and sliding down)? ===========================

It takes Amy a total of 5 minutes to complete one round trip on the slide (4 minutes to climb up and 1 minute to slide down).

 == Answer for question:  2. How many round trips can Amy complete in 15 minutes? ===========================

In 15 minutes, Amy can complete a total of 3 round trips on the slide. Each round trip consists of 4 minutes to climb up and 1 minute to slide down, totaling 5 minutes per round trip. Since 15 minutes divided by 5 minutes per round trip equals 3, Amy can complete 3 round trips before the water slide closes.

 == Answer for question:  3. How many minutes does Amy spend sliding down in total if she slides multiple times before the water slide closes? ===========================

If Amy can complete 3 round trips in 15 minutes, and each round trip consists of 1 minute sliding down, then Amy spends a total of 3 minutes sliding down in total before the water slide closes.

 ======= Final answer for 
It takes Amy 4 minutes to climb the top of a slide. 
It takes 1 minute to slide down. 
The water slide closes in 15 minutes. 
How many times can she slide before it closes?
=============
If Amy can complete 3 round trips in 15 minutes, and each round trip consists of 1 minute sliding down, then Amy spends a total of 3 minutes sliding down in total before the water slide closes.

langchain-ai / rag-from-scratch

Is implementation of Part 7: Decomposition really the implementation of Least-To-Most prompting method? #4