opengear-project / GEAR

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
MIT License
128 stars 10 forks source link

[Bug] maybe a bug in fake_quant_error_simulation function #5

Closed HarryWu99 closed 4 months ago

HarryWu99 commented 4 months ago

https://github.com/opengear-project/GEAR/blob/79ad3fcdb528fceaf605923479fe14fdf3953ffd/TrueCompressionLlaMA/models/TrueCompressFunction.py#L128

error = input - torch.round((input - min) / step)

but it should be error = input - (torch.round((input - min) / step) * step + min). So that the error is the quantization residuals.

HaoKang-Timmy commented 4 months ago

Thanks for reminding us. We have fixed this error in our new version with fused quantized operator support. Will upload soon.