How can the code support 1bit quantization.

yuhuixu1993 commented 1 month ago

Hi, thank the authors for presenting the great work. I am very interested in the performance of kivi with 1bit quantization. How ever It seems do not work. The errors are bellow. Any ideas about that? Many thanks.

hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions

zirui-ray-liu commented 1 month ago

Thank you for interesting in our work!

We didn't implement 1bit quantized matmul CUDA kernel. Currently we only implemented 4bit kernel and 2bit kernel

From accuracy side, you can check the 1bit KIVI performance through simulation or fake quantization (using floating point number to simulate the 1bit integer number)

yuhuixu1993 commented 4 weeks ago

@zirui-ray-liu ,sure thank you

jy-yuan / KIVI

How can the code support 1bit quantization. #26