Closed XuZhao0 closed 1 year ago
Hi @SweetBowl , Thank you for your interest in our work, and sorry for the delayed response.
@madaan do we have any results for AQua?
Hey @SweetBowl , sorry for the delayed response.
We have some preliminary results here. You can use this evaluation script for evaluation.
$ python -u scripts/aqua_eval.py --path "datasets/outputs/aqua_pal_outputs.jsonl" --type code|grep -e "Acc" -e "Eval"
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 254/254 [00:07<00:00, 33.67it/s]
Accuracy = 45.28% (115/254)
This is the same as what codex + cot yield. We haven't explored this much so if you manage to get a better results (with perhaps a better prompt), please PR!
Hi, I find there exists a file called
aqua.txt
which contains prompt for AQuA dataset. But I do not see the result in the paper. Do you test the results on this dataset? If so, how does it work?