sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
6.22k stars 532 forks source link

Add concurrency option in benchmark #2135

Closed cermeng closed 2 days ago

cermeng commented 2 days ago

Motivation

There could be a load balancer above the server to control the request traffic since the server can't reject requests. This pr can simulate this situation. Code borrowed from https://github.com/vllm-project/vllm/pull/9390

Modifications

Checklist