Closed leiwen83 closed 5 months ago
I writed a draft to support vllm. https://github.com/Duyi-Wang/vllm
But not support distributed and continues batching yet.
A fork of vLLM has been created to integrate the xFasterTransformer backend, maintaining compatibility with most of the official vLLM's features. https://github.com/intel/xFasterTransformer/blob/main/serving/vllm-xft.md
Hi,
Do you consider support vllm, which is actually a llm serving standard nowadays?
Thx~