Closed lifelongeeek closed 2 weeks ago
Oh I notice that optimum-quanto offer HQQ, calibration data-free quantization algorithm and achieves fairly good perplexity with Shared-llama-1.3B. Does HQQ officially supported in optimum-quanto?
HQQ and AWQ both use the same group-wise quantization scheme introduced by GPTQ. They only differ from the original GPTQ algorithm in the way they select the scale and adjust the weights (PTQ = Post-Training-Quantization). Quanto uses a similar algorithm so it is strictly equivalent to these methods, although AWQ PTQ is not implemented.
If you plan to reuse existing HQQ or AWQ weights they have sadly chosen in their implementations to store the quantized weights in different formats that quanto can't load at the moment, although it includes packing/unpacking code compatible with AWQ weights (so it should be fairly easy to write a conversion script).
Thanks for the detail explanation.
although it includes packing/unpacking code compatible with AWQ weights (so it should be fairly easy to write a conversion script).
Could you inform us any relevant reference docs or code? I am interested in this direction.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
I can see that optimum-quanto provides several external (weight-only) quantization algorithm such as smoothquant and awq in here.
It looks like smoothquant only supports OPT models, and awq is still under development. Do you have any further development plans for AWQ?