mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.38k stars 184 forks source link

Is this a bug for the quantization phase? #193

Open sleepwalker2017 opened 4 months ago

sleepwalker2017 commented 4 months ago

llm-awq/awq/quantize/quantizer.py at main · mit-han-lab/llm-awq I see the code here and it looks strange:

    if zero_point:
        max_val = w.amax(dim=1, keepdim=True)
        min_val = w.amin(dim=1, keepdim=True)
        max_int = 2**n_bit - 1
        min_int = 0
        scales = (max_val - min_val).clamp(min=1e-5) / max_int
        zeros = (-torch.round(min_val / scales)).clamp_(min_int, max_int)

why do you clamp_ the zeros to [min_int, max_int]?

I did a test here:

import torch

group_size = 8
w_bit = 4

w = torch.tensor([10 + 0.5 * x for x in range(1, 9)])
w = w.reshape(-1, group_size)
max_val = w.amax(dim=1, keepdim=True)
min_val = w.amin(dim=1, keepdim=True)
max_int = 2**w_bit - 1
min_int = 0

print("w is", w)
print(f"max_val is {max_val}, min_val {min_val}, max max int: {max_int}, {min_int}")
scales = (max_val - min_val).clamp(min=1e-5) / max_int
print("scales", scales)

zeros = (-torch.round(min_val / scales)).clamp_(min_int, max_int)
#zeros = -torch.round(min_val/scales)
print("zeros", zeros)

q = torch.round(w / scales) + zeros
print("before clip", q)

q = torch.clamp(torch.round(w / scales) + zeros, min_int, max_int)
print("after clip", q)

w = (q - zeros) * scales
print("after dequant", w)

when I use zeros = (-torch.round(min_val / scales)).clamp_(min_int, max_int), the dequantized data is bad.

image

and when I use zeros = -torch.round(min_val/scales), the dequantized data seems good.

image

Any comment about this ? Thank you! @ys-2020 @kentang-mit @tonylins

Yutong-Dai commented 4 months ago

Some related questions:

  1. How does the function pseudo_quantize_tensor related to equation (1) in the paper? I feel like $\Delta$ is defined differently in this function and in the paper definition.
  2. In the (1), $Q(w)$ may not even be integers if $\Delta$ is not an integer. Am I missing something?