huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
819 stars 61 forks source link

Does AWQ is officially supported now? #313

Closed lifelongeeek closed 2 weeks ago

lifelongeeek commented 1 month ago

I can see that optimum-quanto provides several external (weight-only) quantization algorithm such as smoothquant and awq in here.

It looks like smoothquant only supports OPT models, and awq is still under development. Do you have any further development plans for AWQ?

lifelongeeek commented 1 month ago

Oh I notice that optimum-quanto offer HQQ, calibration data-free quantization algorithm and achieves fairly good perplexity with Shared-llama-1.3B. Does HQQ officially supported in optimum-quanto?

dacorvo commented 1 month ago

HQQ and AWQ both use the same group-wise quantization scheme introduced by GPTQ. They only differ from the original GPTQ algorithm in the way they select the scale and adjust the weights (PTQ = Post-Training-Quantization). Quanto uses a similar algorithm so it is strictly equivalent to these methods, although AWQ PTQ is not implemented.

If you plan to reuse existing HQQ or AWQ weights they have sadly chosen in their implementations to store the quantized weights in different formats that quanto can't load at the moment, although it includes packing/unpacking code compatible with AWQ weights (so it should be fairly easy to write a conversion script).

lifelongeeek commented 1 month ago

Thanks for the detail explanation.

although it includes packing/unpacking code compatible with AWQ weights (so it should be fairly easy to write a conversion script).

Could you inform us any relevant reference docs or code? I am interested in this direction.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been stalled for 5 days with no activity.