Open jasonzliang opened 1 day ago
Hi, the key reason is that first, the reasoning instruction requires more token usage, which is usually larger than the task description + response token usage. Second, extending information decreases reasoning ability (there is a trade-off between prompt length and reasoning ability). We then provide a short version, where we remove some of the instructions and then provide an update prompt file. We also put the original prompt in the prompts folder.
By the way, we recently rewrote our source code with the paper. It may need one month to finish the paper draft, while the source code will upload when we complete the experiments.
Thank you for letting me know, your response is very informative. Is the trade-off between prompt length and reasoning ability true in general for most LLMs? Or does it only affect particular models, like GPT-4 or GPT-3.5?
Hi, following the results of trade-off paper, I believe that the trade-off widely exists in LLMs not only closed-source LLMs (e.g., GPT-4 and GPT-3.5) but also exist in open-source LLMs (e.g., Mixtral).
Hi, I notice that the prompt that is used by default in the code (for example, humaneval_prompt_update.txt) does not contain the following chain of thought prompting:
In the AgentCoder paper (https://arxiv.org/pdf/2312.13010), Figure 6 shows that the prompt snippet above is used by the programmer agent. There is another prompt file (humaneval_prompt.txt) which includes the snippet, but it is not used. Do you know why this is not used? Thanks!