Add concurrency option in benchmark

Motivation

There could be a load balancer above the server to control the request traffic since the server can't reject requests. This pr can simulate this situation. Code borrowed from https://github.com/vllm-project/vllm/pull/9390

Modifications

add an option --max-concurrency to bench_serving.py
make sure there will not exceed max-concurrency requests coming to the server concurrently

Checklist

[ ] Format your code according to the Contributor Guide.
[ ] Add unit tests as outlined in the Contributor Guide.
[ ] Update documentation as needed, including docstrings or example tutorials.

sgl-project / sglang

Add concurrency option in benchmark #2135

Motivation

Modifications

Checklist