Closed cyLi-Tiger closed 3 months ago
Your fp16 benchmark is even wrong so definately you can not reproduce the result. Since you do not provide any details about the experiment setting, I assume you just replace the model name command in run.sh. In run.sh file we run gsm8k-cot and in table2 we have results about gsm8k-zeroshot which are quite different.
Also, many of your questions are well stated in Read.me file or paper. I would like to suggest you to read them first.
Thanks for the work and I try to reproduce the experiments in your paper.
I use the llama2-7b-chat and the default command in /GEAR/GenerationBench/GenerationTest/run.sh to run
evaluation_gsm8k.py
and getI then change the compress_method into
None
and getIt seems that the accuracy degradation is way larger than you the results in Table 2 as the accuray of gsm8k just loss ~0.4. I'm using one A100 80G, please correct me if I misunderstand anything!
Besides, I have some questions: