opengear-project / GEAR

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
MIT License
128 stars 10 forks source link

Questions about zero-shot #14

Closed YcChou closed 1 month ago

YcChou commented 1 month ago

Thanks for your great work!

When I evaluate the zero-shot capability of the llama2-7b-chat model on gsm8k, using your code did not achieve the fp16 baseline results in your paper, 7.7 (reproduced) vs 19.8 (paper). May I ask what your bash script parameters are? Mine are as follows:

python evaluation_gsm8k.py \ --model 'llama2-7b-chat' \ --batch_size 6 \ --max_new_tokens 256 \ --zero_shot \

Thx!

HaoKang-Timmy commented 1 month ago

Because we were using fintuned llama2-7b for evaluation. Now we have changed the code base and new version of paper will be uploaded soon. Now we provide llama3-8b and llama2-13b evaluation on GSM9K, AQUA and BBH dataset.