Can't reproduce the benchmarks

cyLi-Tiger commented 3 months ago

Thanks for the work and I try to reproduce the experiments in your paper.

I use the llama2-7b-chat and the default command in /GEAR/GenerationBench/GenerationTest/run.sh to run evaluation_gsm8k.pyand get

I then change the compress_method into None and get

It seems that the accuracy degradation is way larger than you the results in Table 2 as the accuray of gsm8k just loss ~0.4. I'm using one A100 80G, please correct me if I misunderstand anything!

Besides, I have some questions:

It takes longer time to complete gsm8k for quantized model, is this expected? Or it's just because we don't have the kernel for 4-bit calculation and the extra quantizing operations slow the inference.
How do I actually measure the compress ratio during inference, your compression rate in the paper seems to come from the formula you defined.
What's the diffrence between TrueCompression and Simulated?
In Appendix C you mentioned 2.8x throughput, how to reproduce that?

HaoKang-Timmy commented 3 months ago

Your fp16 benchmark is even wrong so definately you can not reproduce the result. Since you do not provide any details about the experiment setting, I assume you just replace the model name command in run.sh. In run.sh file we run gsm8k-cot and in table2 we have results about gsm8k-zeroshot which are quite different.

HaoKang-Timmy commented 3 months ago

Also, many of your questions are well stated in Read.me file or paper. I would like to suggest you to read them first.

opengear-project / GEAR

Can't reproduce the benchmarks #8