How does the function pseudo_quantize_tensor related to equation (1) in the paper? I feel like $\Delta$ is defined differently in this function and in the paper definition.
In the (1), $Q(w)$ may not even be integers if $\Delta$ is not an integer. Am I missing something?
llm-awq/awq/quantize/quantizer.py at main · mit-han-lab/llm-awq I see the code here and it looks strange:
why do you clamp_ the zeros to [min_int, max_int]?
I did a test here:
when I use
zeros = (-torch.round(min_val / scales)).clamp_(min_int, max_int)
, the dequantized data is bad.and when I use
zeros = -torch.round(min_val/scales)
, the dequantized data seems good.Any comment about this ? Thank you! @ys-2020 @kentang-mit @tonylins