intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
https://arxiv.org/abs/2309.05516
Apache License 2.0
132 stars 18 forks source link

autoround_support_qbits_backend #145

Closed zhewang1-intc closed 1 month ago

zhewang1-intc commented 1 month ago

as title. add a demo file in auto_round dir for better review experience. note: autoGPTQ may use qbits as default CPU inference kernel in the future, pls refer to this pr https://github.com/AutoGPTQ/AutoGPTQ/pull/660 and issue https://github.com/AutoGPTQ/AutoGPTQ/issues/655, I think this pr could be a temp-workaround, once pr660 get merged, we can import quantlinear from autoGPTQ directly.

zhewang1-intc commented 1 month ago

@wenhuach21 @WeiweiZhang1 could you pls take a look?

zhewang1-intc commented 1 month ago

acc looks good. model facebook/opt-125m quant param:

autoround = AutoRound(model, tokenizer, bits=bits, group_size=group_size,
                      sym=sym, nsamples=256, seqlen=512, device="cpu", iters=200)
asym ACC Tasks Version Filter n-shot Metric Value Stderr
lambada_openai 1 none 0 perplexity 29.0069 ± 1.0685
none 0 acc 0.3621 ± 0.0067
sym ACC Tasks Version Filter n-shot Metric Value Stderr
lambada_openai 1 none 0 perplexity 28.6612 ± 1.0599
none 0 acc 0.3621 ± 0.0067
raw model ACC Tasks Version Filter n-shot Metric Value Stderr
lambada_openai 1 none 0 perplexity 26.0200 ± 0.9382
none 0 acc 0.3786 ± 0.0068
zhewang1-intc commented 1 month ago
int2 autoround quantize acc: asym Tasks Version Filter n-shot Metric Value Stderr
lambada_openai 1 none 0 perplexity 184.9410 ± 8.5206
none 0 acc 0.1758 ± 0.0053
sym Tasks Version Filter n-shot Metric Value Stderr
lambada_openai 1 none 0 perplexity 1005.2744 ± 50.4376
none 0 acc 0.0780 ± 0.0037
zhewang1-intc commented 1 month ago
Intel/Qwen2-7B-int4-inc QBits backend acc Tasks Version Filter n-shot Metric Value Stderr
lambada_openai 1 none 0 perplexity 3.5641 ± 0.0739
none 0 acc 0.7246 ± 0.0062