vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
28.79k stars 4.27k forks source link

[Misc]: In vllm, I tested that the speed of concurrent server api requests is greater than the speed of offline inference. I would like to ask if there are any performance tests on the official vllm website. Can you tell me? Thank you. #8610

Open lwdnxu opened 1 month ago

lwdnxu commented 1 month ago

Anything you want to discuss about vllm.

In vllm, I tested that the speed of concurrent server api requests is greater than the speed of offline inference. I would like to ask if there are any performance tests on the official vllm website. Can you tell me? Thank you.

Before submitting a new issue...

youkaichao commented 1 month ago

see https://github.com/vllm-project/vllm/tree/main/benchmarks and https://perf.vllm.ai