I noticed that the rounding method uses round(), this results in values ranging from -128 to 128, rather than -128 to 127. So maybe is not a 8bit quant in some cases, at least I found this problem on llama3. I guess this has something to do with the distribution of parameters.
Can you give me some advice?
I noticed that the rounding method uses round(), this results in values ranging from -128 to 128, rather than -128 to 127. So maybe is not a 8bit quant in some cases, at least I found this problem on llama3. I guess this has something to do with the distribution of parameters. Can you give me some advice?