ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Apache License 2.0
7k stars 570 forks source link

llama.cpp: update GGUF models (with imatrix) #510

Closed ymcui closed 5 months ago

ymcui commented 5 months ago

Description

Recently, llama.cpp introduced importance matrix-aware quantization, which yields further improvements on PPL. Before quantization, the important matrices are calculated through imatrix application. We use Chinese segmentation training data PKU, and iterate over 100 batches to obtain the imatrix.

During quantization, specify --imatrix with the generated imatrix file to allow im-aware quantization. Note that the process will be longer than without imatrix.

Currently, we have converted all available models (only for K-quants). You can download them directly from our Hugging Face model hub. The model name with -im suffix represents the newly converted im-aware models. These models can be used directly without further actions.

The followings are several benchmarks (PPL). Generally speaking, im-quantized models are better but not always.

Chinese-Alpaca-2-7B-RLHF-GGUF

Quant original imatrix (-im)
Q2_K 10.5211 +/- 0.14139 11.9331 +/- 0.16168
Q3_K 8.9748 +/- 0.12043 8.8238 +/- 0.11850
Q4_0 8.7843 +/- 0.11854 -
Q4_K 8.4643 +/- 0.11341 8.4226 +/- 0.11302
Q5_0 8.4563 +/- 0.11353 -
Q5_K 8.3722 +/- 0.11236 8.3336 +/- 0.11192
Q6_K 8.3207 +/- 0.11184 8.3047 +/- 0.11159
Q8_0 8.3100 +/- 0.11173 -

Chinese-LLaMA-2-13B-GGUF

Quant original imatrix (-im)
Q2_K 14.4701 +/- 0.26107 17.4275 +/- 0.31909
Q3_K 10.1620 +/- 0.18277 9.7486 +/- 0.17744
Q4_0 9.8633 +/- 0.17792 -
Q4_K 9.2735 +/- 0.16793 9.2734 +/- 0.16792
Q5_0 9.3553 +/- 0.16945 -
Q5_K 9.1767 +/- 0.16634 9.1594 +/- 0.16590
Q6_K 9.1326 +/- 0.16546 9.1478 +/- 0.16583
Q8_0 9.1394 +/- 0.16574 -

Related Issue

None.