mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
https://arxiv.org/abs/2211.10438
MIT License
1.26k stars 150 forks source link

The upper and lower bounds seems to not 8 bits in some cases #96

Open zhangyu68 opened 3 weeks ago

zhangyu68 commented 3 weeks ago

I noticed that the rounding method uses round(), this results in values ​​ranging from -128 to 128, rather than -128 to 127. So maybe is not a 8bit quant in some cases, at least I found this problem on llama3. I guess this has something to do with the distribution of parameters. Can you give me some advice?

image