mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.55k stars 207 forks source link

Regarding the issues encountered with w_bit 3 quantification #231

Open langxinspieder opened 4 weeks ago

langxinspieder commented 4 weeks ago

Very good job! I have encountered a problem: I can quantify according to a w_bit 4,128 group, but is there a result of quantifying according to a w_bit 3, 128 group in the article, or is the result in the article based on the simulated PPL index? I ran the instruction that could be quantized according to 4 bits, but modified w_bit 4 by w_bit 3, which resulted in an error, as shown in the figure 微信图片_20241030183931 What should I do?

terarachang commented 6 days ago

Hi, I got a similar issue when generating real quantized weights (w3).

  File "/home/username/llm-awq/awq/quantize/qmodule.py", line 83, in __init__
    raise NotImplementedError("Only 4-bit are supported for now.")

Any solutions?