zhao-zilong / ssc-cot

Git for "Stepwise Self-Consistent Mathematical Reasoning with Large Language Models"
MIT License
10 stars 0 forks source link

What is the performance on GPT4? #1

Closed Zui-C closed 5 months ago

Zui-C commented 5 months ago

Hi, I am very interested in your method. But I noticed that Table 1. Result on MATH. shows 27.4 in total. And it's based on LLEMMA. I would like to know if you have tried it on GPT4 or GPT3.5. Can it exceed 51.8% like using PAL on GPT-4?

zhao-zilong commented 5 months ago

Hi @Zui-C

Actually our results are based on gpt 3.5. On Math, we did not run on GPT4, but for TriMaster100, I indeed tested on GPT4, but not for 20 runs per questions. I got 89/750 points. Given that GPT3.5 only scores half of that on TriMaster100, SSC-CoT with GPT4 can indeed increases its performance, but it is too slow and expensive, we end this experiment after several tries.

Best,

Zilong

zhao-zilong commented 5 months ago

Also @Zui-C Our results are only on Math Level 5 not all levels of Math datasets. There is big difference, since level 5 is the most difficult level question.

Zui-C commented 5 months ago

Thanks! Table 1 shows results on MATH level5 with SSC-CoT on GPT3.5.