Results on AQuA dataset

XuZhao0 commented 1 year ago

Hi, I find there exists a file called aqua.txt which contains prompt for AQuA dataset. But I do not see the result in the paper. Do you test the results on this dataset? If so, how does it work?

urialon commented 1 year ago

Hi @SweetBowl , Thank you for your interest in our work, and sorry for the delayed response.

@madaan do we have any results for AQua?

madaan commented 1 year ago

Hey @SweetBowl , sorry for the delayed response.

We have some preliminary results here. You can use this evaluation script for evaluation.

$ python -u scripts/aqua_eval.py --path "datasets/outputs/aqua_pal_outputs.jsonl" --type code|grep -e "Acc" -e "Eval"
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 254/254 [00:07<00:00, 33.67it/s]
Accuracy = 45.28% (115/254)

This is the same as what codex + cot yield. We haven't explored this much so if you manage to get a better results (with perhaps a better prompt), please PR!

reasoning-machines / pal

Results on AQuA dataset #11