[Misc]: In vllm, I tested that the speed of concurrent server api requests is greater than the speed of offline inference. I would like to ask if there are any performance tests on the official vllm website. Can you tell me? Thank you.

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

28.79k stars 4.27k forks source link

[Misc]: In vllm, I tested that the speed of concurrent server api requests is greater than the speed of offline inference. I would like to ask if there are any performance tests on the official vllm website. Can you tell me? Thank you. #8610

Open lwdnxu opened 1 month ago

lwdnxu commented 1 month ago

Anything you want to discuss about vllm.

In vllm, I tested that the speed of concurrent server api requests is greater than the speed of offline inference. I would like to ask if there are any performance tests on the official vllm website. Can you tell me? Thank you.

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

youkaichao commented 1 month ago

see https://github.com/vllm-project/vllm/tree/main/benchmarks and https://perf.vllm.ai