opengear-project / GEAR

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
MIT License
128 stars 10 forks source link

questions about GenerationTest folder #4

Closed hzfengfengxia closed 4 months ago

hzfengfengxia commented 5 months ago

Hi, thank you for your amazing work! I saw the sentence

Llama-2-7b, Llama-2-13b, and Mistral-7b with GEAR test on GSM8K, GSM8K-COT, MMLU, MMLU-COT, and BBH-COT

in the readme file inside the GenerationBench/GenerationTest folder. However, when reviewing the code(evaluation_gsm8k.py), I couldn't identify where GEAR is used.

Additionally, in the compression method section

  --compress_method {groupquantization, groupquantization_token, groupquantization_channel, groupquantization_kc_vt, uniformquantization, poweriteration, outlierquantization, quantize_with_lrap, outliterquantize_with_lrap}, 

GEAR is not mentioned either. Could you please advise me on how to address this issue? If I need to conduct ablation experiments about low rank matrices, how should I proceed?

HaoKang-Timmy commented 4 months ago

For first question: https://github.com/opengear-project/GEAR/blob/79ad3fcdb528fceaf605923479fe14fdf3953ffd/GenerationBench/GenerationTest/evaluation_gsm8k.py#L399 For second question: outliterquantize_with_lrap is one framework of gear, we will support more results besides outlier quantization only in the future.

HaoKang-Timmy commented 4 months ago

Also, see here for command details https://github.com/opengear-project/GEAR/blob/main/GenerationBench/GenerationTest/run.sh