autoround_support_qbits_backend - Githubissues

intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

https://arxiv.org/abs/2309.05516

Apache License 2.0

132 stars 18 forks source link

autoround_support_qbits_backend #145

Closed zhewang1-intc closed 1 month ago

zhewang1-intc commented 1 month ago

as title. add a demo file in auto_round dir for better review experience. note: autoGPTQ may use qbits as default CPU inference kernel in the future, pls refer to this pr https://github.com/AutoGPTQ/AutoGPTQ/pull/660 and issue https://github.com/AutoGPTQ/AutoGPTQ/issues/655, I think this pr could be a temp-workaround, once pr660 get merged, we can import quantlinear from autoGPTQ directly.

zhewang1-intc commented 1 month ago

@wenhuach21 @WeiweiZhang1 could you pls take a look?

zhewang1-intc commented 1 month ago

acc looks good. model facebook/opt-125m quant param:

autoround = AutoRound(model, tokenizer, bits=bits, group_size=group_size,
                      sym=sym, nsamples=256, seqlen=512, device="cpu", iters=200)

asym ACC	Tasks	Version	Filter	n-shot	Metric	Value		Stderr
lambada_openai	1	none	0	perplexity	29.0069	±	1.0685
		none	0	acc	0.3621	±	0.0067

sym ACC	Tasks	Version	Filter	n-shot	Metric	Value		Stderr
lambada_openai	1	none	0	perplexity	28.6612	±	1.0599
		none	0	acc	0.3621	±	0.0067

raw model ACC	Tasks	Version	Filter	n-shot	Metric	Value		Stderr
lambada_openai	1	none	0	perplexity	26.0200	±	0.9382
		none	0	acc	0.3786	±	0.0068

zhewang1-intc commented 1 month ago

int2 autoround quantize acc: asym	Tasks	Version	Filter	n-shot	Metric	Value		Stderr
lambada_openai	1	none	0	perplexity	184.9410	±	8.5206
		none	0	acc	0.1758	±	0.0053

sym	Tasks	Version	Filter	n-shot	Metric	Value		Stderr
lambada_openai	1	none	0	perplexity	1005.2744	±	50.4376
		none	0	acc	0.0780	±	0.0037

zhewang1-intc commented 1 month ago

Intel/Qwen2-7B-int4-inc QBits backend acc	Tasks	Version	Filter	n-shot	Metric	Value		Stderr
lambada_openai	1	none	0	perplexity	3.5641	±	0.0739
		none	0	acc	0.7246	±	0.0062