Can this be used with any autogressive model?

Thank you for interesting in our work. Yes, it can be applied to any autoregressive model!

You can think KiVi as another implementation of KVCache. You can copy-paste the code here and modify it for the attention module in different models like MistralAttention.

Here we would like to note two important things we found when we extend KiVi to different models.

Transformer Package Version: Please double check the transformer package version. We test our implementation in 4.35.2. Yet in version >= 4.36, the KVCache data structure is changed.
Attention Implementation Variants: Please double check what is the attention mechanism used in the model (Multi-Head/Multi-Query/Group-Query). Currently we only release the CUDA and triton code for supporting Multi-Head Attention. For Multi-Query/Group-Query, it needs to change the low-level implementation a little bit. We will release this part soon.

Stay tuned for further developments!

jy-yuan / KIVI

Can this be used with any autogressive model? #1