huangd1999 / AgentCoder

This Repo is the official implementation of AgentCoder and AgentCoder+.
223 stars 50 forks source link

Question regarding Chain-of-Thought in prompts #13

Open jasonzliang opened 1 day ago

jasonzliang commented 1 day ago

Hi, I notice that the prompt that is used by default in the code (for example, humaneval_prompt_update.txt) does not contain the following chain of thought prompting:

**Instructions**:
1. **Understand and Clarify**: Make sure you understand the task. 
2. **Algorithm/Method Selection**: Decide on the most efficient way.
3. **Pseudocode Creation**: Write down the steps you will follow in pseudocode. 
4. **Code Generation**: Translate your pseudocode into executable Python code. 

In the AgentCoder paper (https://arxiv.org/pdf/2312.13010), Figure 6 shows that the prompt snippet above is used by the programmer agent. There is another prompt file (humaneval_prompt.txt) which includes the snippet, but it is not used. Do you know why this is not used? Thanks!

huangd1999 commented 1 day ago

Hi, the key reason is that first, the reasoning instruction requires more token usage, which is usually larger than the task description + response token usage. Second, extending information decreases reasoning ability (there is a trade-off between prompt length and reasoning ability). We then provide a short version, where we remove some of the instructions and then provide an update prompt file. We also put the original prompt in the prompts folder.

By the way, we recently rewrote our source code with the paper. It may need one month to finish the paper draft, while the source code will upload when we complete the experiments.

jasonzliang commented 1 day ago

Thank you for letting me know, your response is very informative. Is the trade-off between prompt length and reasoning ability true in general for most LLMs? Or does it only affect particular models, like GPT-4 or GPT-3.5?

huangd1999 commented 1 day ago

Hi, following the results of trade-off paper, I believe that the trade-off widely exists in LLMs not only closed-source LLMs (e.g., GPT-4 and GPT-3.5) but also exist in open-source LLMs (e.g., Mixtral).