vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.8k stars 3.93k forks source link

[Performance]: under performing in comparision of sglang #7108

Open meetzuber opened 1 month ago

meetzuber commented 1 month ago

Proposal to improve performance

vLLm is under performing in comparison with sglang. There is something which need optimization for better performance.

Report of performance regression

https://lmsys.org/blog/2024-07-25-sglang-llama3/

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`
felixzhu555 commented 1 month ago

See #6801!