Closed LeoYML closed 1 month ago
Thanks for your interest in our work!
The experiments are done with GPT3.5 API --- combining different prompting prefixes or postfixes with the queries in different datasets (e.g., the GSM8K dataset).
There are some updates on "prompt optimization" after the very early stages when people tried to find the "magic words" as prompting strategies:
According to a later paper by Google https://arxiv.org/abs/2309.03409 Optimal prompts can be different for different types of LLMs.
The take-away here is that the performance of the prompting strategy is LLM-dependent.
This ICLR'24 paper https://arxiv.org/pdf/2309.06553 introduces a systematic way of discovering the optimal prompts for different queries.
The take-away here is that the performance of the prompting strategy is query-dependent.
I hope this could help :)
Best, Hao
Thank you very much for your quick and clear response.
closing in active issue after 3 months :)
This is an interesting and outstanding piece of work. How can I reproduce the experiments detailed in the README? Additionally, how does the performance compare when using GPT-4?