[Bug] maybe a bug in fake_quant_error_simulation function

opengear-project / GEAR

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM

MIT License

128 stars 10 forks source link

Closed HarryWu99 closed 4 months ago

HarryWu99 commented 4 months ago

error = input - torch.round((input - min) / step)

but it should be error = input - (torch.round((input - min) / step) * step + min). So that the error is the quantization residuals.

HaoKang-Timmy commented 4 months ago

Thanks for reminding us. We have fixed this error in our new version with fused quantized operator support. Will upload soon.