mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.08k stars 150 forks source link

Possible Bug in "_search_module_scale" Function #163

Open satabios opened 3 months ago

satabios commented 3 months ago

In the function _search_module_scale. https://github.com/mit-han-lab/llm-awq/blob/79019832efd37e4c24a695442880190858aa605e/awq/quantize/auto_scale.py#L131 )

for fc in linears2scale:
   fc.weight.mul_(scales.view(1, -1).to(fc.weight.device))
   fc.weight.data = w_quantize_func(fc.weight.data) / (scales.view(1, -1))

The FC weights are updated for each scale used in the grid search. But shouldn't the weight be reset to the original values for the next iteration? Otherwise, wouldn't the scale value be compounded?

Or Am I not observing something here?