Inquiries on Experiment Reproduction

LeoYML commented 4 months ago

This is an interesting and outstanding piece of work. How can I reproduce the experiments detailed in the README? Additionally, how does the performance compare when using GPT-4?

holarissun commented 4 months ago

Thanks for your interest in our work!

The experiments are done with GPT3.5 API --- combining different prompting prefixes or postfixes with the queries in different datasets (e.g., the GSM8K dataset).

There are some updates on "prompt optimization" after the very early stages when people tried to find the "magic words" as prompting strategies:

According to a later paper by Google https://arxiv.org/abs/2309.03409 Optimal prompts can be different for different types of LLMs.

The take-away here is that the performance of the prompting strategy is LLM-dependent.
This ICLR'24 paper https://arxiv.org/pdf/2309.06553 introduces a systematic way of discovering the optimal prompts for different queries.

The take-away here is that the performance of the prompting strategy is query-dependent.

I hope this could help :)

Best, Hao

LeoYML commented 4 months ago

Thank you very much for your quick and clear response.

holarissun commented 1 month ago

closing in active issue after 3 months :)

holarissun / PanelGPT

Inquiries on Experiment Reproduction #2