vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
23.76k stars 3.41k forks source link

[Feature]: vAttention #4675

Open nivibilla opened 2 months ago

nivibilla commented 2 months ago

🚀 The feature, motivation and pitch

Claim major improvements over vllm. Unfortunately no code only the paper.

arxiv.org/abs/2405.04437

Alternatives

No response

Additional context

No response

ramyaprabhu-alt commented 2 months ago

Hi, I'm one of the authors of this paper. Thank you for your interest in our work! We plan to release the code soon, hopefully in a few weeks.

Jeffwan commented 1 month ago

@ramyaprabhu-alt Just curious the code release would be a separate project or a PR against vLLM? I think it's a PR, right?

ramyaprabhu-alt commented 1 month ago

Our initial release will be as a separate project on a slightly older version of vLLM. But soon after, we can also raise a PR against vLLM-latest.

apanwariisc commented 3 days ago

Glad to share the source code of vAttention. Please check it out here: https://github.com/microsoft/vattention