vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.75k stars 4.1k forks source link

How to use Splitwise(from microsoft) in vllm? #2370

Open Lin-Qingyang-Alec opened 8 months ago

Lin-Qingyang-Alec commented 8 months ago

Microsoft have claimed that ”Splitwise“ is supported in vLLM, see https://www.microsoft.com/en-us/research/blog/splitwise-improves-gpu-usage-by-splitting-llm-inference-phases/ image

So how to use it in vLLM? I could not find keyword about ”Splitwise“.

monuminu commented 8 months ago

same here

Lin-Qingyang-Alec commented 8 months ago

@monuminu From your homepage, it appears that you work at Microsoft. Don’t you know where this part of the code can be obtained?😂

monuminu commented 8 months ago

Let me get the info :)

Lin-Qingyang-Alec commented 8 months ago

@monuminu Ok, please keep me posted if there are any updates.

monuminu commented 8 months ago

Code will be released in some days

Lin-Qingyang-Alec commented 8 months ago

Thank you, I am looking forward to the day when the code is available.

kd303 commented 8 months ago

As per the blog, this seems implemented or part of vLLM

Our approach is now part of vLLM(opens in new tab) and can also be implemented with other frameworks.

Lin-Qingyang-Alec commented 8 months ago

@kd303 I have read this blog yet. But do you know how to use it in vllm? I can not find any keyword in vllm code.

aashaka commented 8 months ago

Thanks for the interest in Splitwise code. We have opened https://github.com/vllm-project/vllm/issues/2472 to track the open sourcing of our internal prototype.

hmellor commented 6 months ago

https://github.com/vllm-project/vllm/pull/2809