Benchmarking summary:
Time taken for tests: 22.512 seconds
Expected number of requests: 100
Number of concurrency: 128
Total requests: 100
Succeed requests: 100
Failed requests: 0
Average QPS: 4.442
Average latency: 14.140
Throughput(average output tokens per second): 891.275
Average time to first token: 2.701
Average input tokens per request: 28.890
Average output tokens per request: 200.640
Average time per output token: 0.00112
Average package per request: 191.830
Average package latency: 0.060
Percentile of time to first token:
p50: 2.7137
p66: 2.7370
p75: 2.7879
p80: 2.8042
p90: 2.8816
p95: 2.9215
p98: 2.9364
p99: 2.9847
Percentile of request latency:
p50: 14.7637
p66: 17.0512
p75: 17.7525
p80: 18.3740
p90: 19.7777
p95: 20.1707
p98: 21.1066
p99: 22.5016
Benchmarking summary: Time taken for tests: 22.512 seconds Expected number of requests: 100 Number of concurrency: 128 Total requests: 100 Succeed requests: 100 Failed requests: 0 Average QPS: 4.442 Average latency: 14.140 Throughput(average output tokens per second): 891.275 Average time to first token: 2.701 Average input tokens per request: 28.890 Average output tokens per request: 200.640 Average time per output token: 0.00112 Average package per request: 191.830 Average package latency: 0.060 Percentile of time to first token: p50: 2.7137 p66: 2.7370 p75: 2.7879 p80: 2.8042 p90: 2.8816 p95: 2.9215 p98: 2.9364 p99: 2.9847 Percentile of request latency: p50: 14.7637 p66: 17.0512 p75: 17.7525 p80: 18.3740 p90: 19.7777 p95: 20.1707 p98: 21.1066 p99: 22.5016