Open mlxht990720 opened 2 months ago
Hi, we do not need to store a whole S matrix instead we store outliers' index and value. Please refer to here for more details.
Got it! Thank u very much! I still have another question, when I reproduce the accurancy results in table 1 on GSM8K, no matter how I change the configs like compression method or quantization bit, the accurancy result remains the same. Have I missed some instructions?
Thanks for your great work and the open-sourced code! I have some problems with the storage of sparse matrix S. Could you please provide the code to reproduce the memory size results of KV cache in your paper? Thanks a lot!!!