Closed zhewang1-intc closed 1 month ago
@wenhuach21 @WeiweiZhang1 could you pls take a look?
acc looks good. model facebook/opt-125m quant param:
autoround = AutoRound(model, tokenizer, bits=bits, group_size=group_size,
sym=sym, nsamples=256, seqlen=512, device="cpu", iters=200)
asym ACC | Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|---|
lambada_openai | 1 | none | 0 | perplexity | 29.0069 | ± | 1.0685 | |
none | 0 | acc | 0.3621 | ± | 0.0067 |
sym ACC | Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|---|
lambada_openai | 1 | none | 0 | perplexity | 28.6612 | ± | 1.0599 | |
none | 0 | acc | 0.3621 | ± | 0.0067 |
raw model ACC | Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|---|
lambada_openai | 1 | none | 0 | perplexity | 26.0200 | ± | 0.9382 | |
none | 0 | acc | 0.3786 | ± | 0.0068 |
int2 autoround quantize acc: asym | Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|---|
lambada_openai | 1 | none | 0 | perplexity | 184.9410 | ± | 8.5206 | |
none | 0 | acc | 0.1758 | ± | 0.0053 |
sym | Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|---|
lambada_openai | 1 | none | 0 | perplexity | 1005.2744 | ± | 50.4376 | |
none | 0 | acc | 0.0780 | ± | 0.0037 |
Intel/Qwen2-7B-int4-inc QBits backend acc | Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|---|
lambada_openai | 1 | none | 0 | perplexity | 3.5641 | ± | 0.0739 | |
none | 0 | acc | 0.7246 | ± | 0.0062 |
as title. add a demo file in auto_round dir for better review experience. note: autoGPTQ may use qbits as default CPU inference kernel in the future, pls refer to this pr https://github.com/AutoGPTQ/AutoGPTQ/pull/660 and issue https://github.com/AutoGPTQ/AutoGPTQ/issues/655, I think this pr could be a temp-workaround, once pr660 get merged, we can import quantlinear from autoGPTQ directly.