Is this a bug for the quantization phase?

llm-awq/awq/quantize/quantizer.py at main · mit-han-lab/llm-awq I see the code here and it looks strange:

    if zero_point:
        max_val = w.amax(dim=1, keepdim=True)
        min_val = w.amin(dim=1, keepdim=True)
        max_int = 2**n_bit - 1
        min_int = 0
        scales = (max_val - min_val).clamp(min=1e-5) / max_int
        zeros = (-torch.round(min_val / scales)).clamp_(min_int, max_int)

why do you clamp_ the zeros to [min_int, max_int]?

I did a test here:

import torch

group_size = 8
w_bit = 4

w = torch.tensor([10 + 0.5 * x for x in range(1, 9)])
w = w.reshape(-1, group_size)
max_val = w.amax(dim=1, keepdim=True)
min_val = w.amin(dim=1, keepdim=True)
max_int = 2**w_bit - 1
min_int = 0

print("w is", w)
print(f"max_val is {max_val}, min_val {min_val}, max max int: {max_int}, {min_int}")
scales = (max_val - min_val).clamp(min=1e-5) / max_int
print("scales", scales)

zeros = (-torch.round(min_val / scales)).clamp_(min_int, max_int)
#zeros = -torch.round(min_val/scales)
print("zeros", zeros)

q = torch.round(w / scales) + zeros
print("before clip", q)

q = torch.clamp(torch.round(w / scales) + zeros, min_int, max_int)
print("after clip", q)

w = (q - zeros) * scales
print("after dequant", w)

when I use zeros = (-torch.round(min_val / scales)).clamp_(min_int, max_int), the dequantized data is bad.

and when I use zeros = -torch.round(min_val/scales), the dequantized data seems good.

Any comment about this ? Thank you! @ys-2020 @kentang-mit @tonylins

mit-han-lab / llm-awq

Is this a bug for the quantization phase? #193