Closed lmitlaender closed 9 months ago
https://www.youtube.com/watch?v=TJ5K1CO9Wbs See parts of this video for some of the speedups that vllm can support
Might be interesting: https://github.com/ray-project/ray-llm integrates vllm with ray in a single solution combining multiple optimizations
This issues goal is to evaluate the vllm-project for llm serving (in comparison to ollama).
vllm source code: https://github.com/vllm-project/vllm