opengear-project / GEAR

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
MIT License
116 stars 7 forks source link

Qustion about storage #7

Open mlxht990720 opened 2 months ago

mlxht990720 commented 2 months ago

Thanks for your great work and the open-sourced code! I have some problems with the storage of sparse matrix S. Could you please provide the code to reproduce the memory size results of KV cache in your paper? Thanks a lot!!!

HaoKang-Timmy commented 2 months ago

Hi, we do not need to store a whole S matrix instead we store outliers' index and value. Please refer to here for more details.

mlxht990720 commented 1 month ago

Got it! Thank u very much! I still have another question, when I reproduce the accurancy results in table 1 on GSM8K, no matter how I change the configs like compression method or quantization bit, the accurancy result remains the same. Have I missed some instructions?