Closed ascendpoet closed 2 weeks ago
Thank you for interesting in our work! The instruction is given in README. If you are interested in reproducing the reported number if our paper,
For LongBench, you can obtain the result by
# In our paper, K_BITS==V_BITS==2, GROUP_LENGTH==32, RESIDUAL_LENGTH==128
bash scripts/long_test.sh {GPU_ID} {K_BITS} {V_BITS} {GROUP_LENGTH} {RESIDUAL_LENGTH} {MODEL_NAME}
python eval_long_bench.py --model {MODEL} # MODEL is the dir name under pred/ Currently it support Llama family model and Mistral model.
For tasks like GSM8K, CoQA, TruthfulQA, you can obtain the result by
git checkout -b lmeval
git pull
cd lm-evaluation-harness
pip install -e .
cd ..
# We report TASK in {coqa, truthfulqa_gen, gsm8k} in our paper.
# If use KIVI implementation, set K_BITS and V_BITS to 2 or 4.
# If use baseline, set K_BITS and V_BITS to 16.
bash scripts/lmeval_test.sh {GPU_ID} {K_BITS} {V_BITS} {GROUP_LENGTH} {RESIDUAL_LENGTH} {TASK} {MODEL_NAME}
Let me know if you cannot reproduce our results or have further questions.
The project is excellent. Can you provide an accuracy testing interface? It can save much time in accuracy testing.