Closed zeng-zc closed 1 month ago
The doc says time.perf_counter()
"Return the value (in fractional seconds)". Am I missing something?
The doc says
time.perf_counter()
"Return the value (in fractional seconds)". Am I missing something?
Oh..I'm sorry, the doc is right. I write a little test code:
# Python program to show time by perf_counter()
import time
# Start the stopwatch / counter
t1_start = time.perf_counter()
time.sleep(3)
# Stop the stopwatch / counter
t1_stop = time.perf_counter()
print("Elapsed time:", t1_stop, t1_start)
print("Elapsed time during the whole program in seconds:",
t1_stop-t1_start)
and the output:
Elapsed time: 25354205.356015064 25354202.35303321
Elapsed time during the whole program in seconds: 3.0029818527400494
So the benchmark output is right. What an amazing thingthe the latency is so big (mean ~800s) when processing 3000 requests...
Sorry, it's my fault, please close this issue.
It's not a bug.
Checklist
Describe the bug
I run a benchmark as this:
The model is a Qwen2-72B-instruct model, and the output of this benchmark is:
We can see the time units are all
ms
, but it should be microsecond.I've checked the codes: https://github.com/sgl-project/sglang/blob/main/python/sglang/bench_serving.py#L99
time.perf_counter()
returns microseconds on my system, which is A800-SXM4-40GB. Perhaps the return value of this function varies on different systems. We'd better usetime.perf_counter_ns()
to return ns definitely to fix this bug, as the docs says: https://docs.python.org/3/library/time.html#time.perf_counterReproduction
client side:
server side: just start serving any 72B or similar llms with latest sglang
Environment
A800-SXM4-40GB * 8 gpus for both server and client.