[Feature]: vAttention - Githubissues

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

23.76k stars 3.41k forks source link

Open nivibilla opened 2 months ago

nivibilla commented 2 months ago

Claim major improvements over vllm. Unfortunately no code only the paper.

arxiv.org/abs/2405.04437

No response

No response

ramyaprabhu-alt commented 2 months ago

Hi, I'm one of the authors of this paper. Thank you for your interest in our work! We plan to release the code soon, hopefully in a few weeks.

Jeffwan commented 1 month ago

@ramyaprabhu-alt Just curious the code release would be a separate project or a PR against vLLM? I think it's a PR, right?

ramyaprabhu-alt commented 1 month ago

Our initial release will be as a separate project on a slightly older version of vLLM. But soon after, we can also raise a PR against vLLM-latest.

apanwariisc commented 3 days ago

Glad to share the source code of vAttention. Please check it out here: https://github.com/microsoft/vattention