可以与vllm集成吗？

thunlp / InfLLM

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

MIT License

269 stars 21 forks source link

Open zhangxii opened 4 months ago

zhangxii commented 4 months ago

您好，我看代码中加载模型都是用AutoModelForCausalLM的API，请问是否可以支持用vllm加载模型，提高推理速度？据我所知，vllm似乎也是通过PageAttention来管理内存空间中的 keys 和 values，这与infllm工作有冲突吗？