jy-yuan / KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
https://arxiv.org/abs/2402.02750
MIT License
216 stars 19 forks source link

How can the code support 1bit quantization. #26

Closed yuhuixu1993 closed 4 weeks ago

yuhuixu1993 commented 1 month ago

Hi, thank the authors for presenting the great work. I am very interested in the performance of kivi with 1bit quantization. How ever It seems do not work. The errors are bellow. Any ideas about that? Many thanks.

hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions
zirui-ray-liu commented 1 month ago

Thank you for interesting in our work!

We didn't implement 1bit quantized matmul CUDA kernel. Currently we only implemented 4bit kernel and 2bit kernel

From accuracy side, you can check the 1bit KIVI performance through simulation or fake quantization (using floating point number to simulate the 1bit integer number)

yuhuixu1993 commented 4 weeks ago

@zirui-ray-liu ,sure thank you