thunlp / InfLLM

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
MIT License
309 stars 29 forks source link

你好,有没有尝试过只在某些层进行lookup,这样就可以减少缓存KV的数量 #17

Closed MrJiangZhongZheng closed 8 months ago

MrJiangZhongZheng commented 8 months ago

https://arxiv.org/pdf/2203.08913.pdf

MrJiangZhongZheng commented 8 months ago

https://arxiv.org/pdf/2203.08913.pdf 这篇文章证明在某些层进行lookup似乎更好

guyan364 commented 8 months ago

目前没有尝试,你可以修改 patch 为不同层使用不同的 streaming attention,比如0-7、24-31层使用 infinite lm,我们计划之后加入这个功能的支持。