Regarding the issues encountered with w_bit 3 quantification

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

MIT License

2.55k stars 207 forks source link

Regarding the issues encountered with w_bit 3 quantification #231

Open langxinspieder opened 4 weeks ago

langxinspieder commented 4 weeks ago

Very good job! I have encountered a problem: I can quantify according to a w_bit 4，128 group, but is there a result of quantifying according to a w_bit 3， 128 group in the article, or is the result in the article based on the simulated PPL index? I ran the instruction that could be quantized according to 4 bits, but modified w_bit 4 by w_bit 3, which resulted in an error, as shown in the figure 微信图片_20241030183931 What should I do？

terarachang commented 6 days ago

Hi, I got a similar issue when generating real quantized weights (w3).

  File "/home/username/llm-awq/awq/quantize/qmodule.py", line 83, in __init__
    raise NotImplementedError("Only 4-bit are supported for now.")

Any solutions?