Closed goiri closed 3 months ago
This was asked in https://github.com/vllm-project/vllm/issues/2370.
LGTM, I was wondering when can we use it in vllm?
@irasin, @aashaka is doing some cleanup and refactoring and will be posting the PRs in the next few weeks. We will be updating this issue (and linking the PRs) with the progress.
Hi All,
Just wanted to check in and see if there is any update on Splitwise's implementation in vLLM, and if this internal prototype codebase can be released?
Thank you!
This has now been released in PR https://github.com/vllm-project/vllm/pull/2809. @adney11, @irasin
We have built the system described in http://aka.ms/splitwise Splitwise splits the prompt and token phases to run in different servers. This leverages the differences between these two phases to improve throughput. We have an internal prototype on top of an internal vLLM branch. This issue tracks the effort to open source this prototype and make it part of the official vLLM.
This includes: