opengear-project / GEAR

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
MIT License
128 stars 10 forks source link

Where is the outlier extraction logic in `cuda_support_gear` #13

Closed Ther-nullptr closed 1 month ago

Ther-nullptr commented 1 month ago

Hello, I have read the paper of GEAR, and read the code of cuda_support_gear part. I wanna ask where is the "outlier extraction" logic in cuda/triton? I only see the quantization and error correlation in the code.

HaoKang-Timmy commented 1 month ago

"outlier extraction" cuda kernel will be added soon. Now the cuda supported code is only for GEAR-L.