whyNLP / LCKV

Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance. Accepted to ACL 2024.
https://arxiv.org/abs/2405.10637
139 stars 6 forks source link