jy-yuan / KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
https://arxiv.org/abs/2402.02750
MIT License
121 stars 10 forks source link

Provide an accuracy testing interface? #8

Closed ascendpoet closed 2 weeks ago

ascendpoet commented 3 weeks ago

The project is excellent. Can you provide an accuracy testing interface? It can save much time in accuracy testing.

zirui-ray-liu commented 3 weeks ago

Thank you for interesting in our work! The instruction is given in README. If you are interested in reproducing the reported number if our paper,

For LongBench, you can obtain the result by

# In our paper, K_BITS==V_BITS==2, GROUP_LENGTH==32, RESIDUAL_LENGTH==128
bash scripts/long_test.sh {GPU_ID} {K_BITS} {V_BITS} {GROUP_LENGTH} {RESIDUAL_LENGTH} {MODEL_NAME}
python eval_long_bench.py --model {MODEL} # MODEL is the dir name under pred/ Currently it support Llama family model and Mistral model.

For tasks like GSM8K, CoQA, TruthfulQA, you can obtain the result by

git checkout -b lmeval
git pull

cd lm-evaluation-harness
pip install -e .
cd ..

# We report TASK in {coqa, truthfulqa_gen, gsm8k} in our paper.
# If use KIVI implementation, set K_BITS and V_BITS to 2 or 4.
# If use baseline, set K_BITS and V_BITS to 16.
bash scripts/lmeval_test.sh {GPU_ID} {K_BITS} {V_BITS} {GROUP_LENGTH} {RESIDUAL_LENGTH} {TASK} {MODEL_NAME}

Let me know if you cannot reproduce our results or have further questions.