ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Apache License 2.0
7.01k stars 571 forks source link

Add AWQ models (pre-computed search results) #490

Closed ymcui closed 6 months ago

ymcui commented 6 months ago

Description

AWQ (Activation-aware Weight Quantization) is an efficient quantization method for LLMs.

The pre-computed search results of our models are available (--w_bit 4 --q_group_size 128): https://huggingface.co/hfl/chinese-llama-alpaca-2-awq

Detailed usage should refer to:

The followings are several benchmarks w.r.t. PPL (lower is better) under llama.cpp.

Model Q8_0 Q4_K Q4_1 Q4_0
LLAMA-2-7B 9.4367 +/- 0.19841 9.5558 +/- 0.19945 9.9055 +/- 0.2064 9.7749 +/- 0.20262
LLAMA-2-7B-W4-G128 9.4825 +/- 0.19979 9.6023 +/- 0.20070 9.6019 +/- 0.19871 9.7943 +/- 0.20638
LLAMA-2-7B-16K 9.3918 +/- 0.20264 9.5362 +/- 0.20393 9.8198 +/- 0.20869 9.8385 +/- 0.20930
LLAMA-2-7B-16K-W4-G128 9.4406 +/- 0.20433 9.6051 +/- 0.20763 9.6090 +/- 0.20606 9.6826 +/- 0.20876
Model Q8_0 Q4_K Q4_1 Q4_0
Alpaca-2-7B 8.1665 +/- 0.11201 8.3177 +/- 0.11366 8.9491 +/- 0.12054 8.6379 +/- 0.11857
Alpaca-2-7B-W4-G128 8.2231 +/- 0.11298 8.3437 +/- 0.11456 8.4342 +/- 0.11515 8.4620 +/- 0.11681
Alpaca-2-7B-16K 8.7512 +/- 0.12241 8.9539 +/- 0.12490 9.5298 +/- 0.13157 9.6554 +/- 0.13464
Alpaca-2-7B-16K-W4-G128 8.7890 +/- 0.12288 8.9361 +/- 0.12447 8.9941 +/- 0.12498 9.0204 +/- 0.12591
Alpaca-2-7B-RLHF 8.2941 +/- 0.11139 8.4552 +/- 0.11323 9.2239 +/- 0.12269 8.7774 +/- 0.11834
Alpaca-2-7B-RLHF-W4-G128 8.3554 +/- 0.11242 8.4957 +/- 0.11406 8.6361 +/- 0.11541 8.5973 +/- 0.11602

Conclusion: If you are using Q4_1 or Q4_0 quantization in llama.cpp, please consider using AWQ.

Related Issue

None.

Explanation of Changes

copilot:walkthrough