sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
6.23k stars 538 forks source link

[Bug] The time unit in bench_serving is wrong on A800-SXM4-40GB, using perf_counter_ns could fix #1630

Closed zeng-zc closed 1 month ago

zeng-zc commented 1 month ago

Checklist

Describe the bug

I run a benchmark as this:

python -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 3000 --random-input 1024 --random-output 1024 --random-range-ratio 0.5

The model is a Qwen2-72B-instruct model, and the output of this benchmark is: sglang_bench_serving

We can see the time units are all ms, but it should be microsecond.

I've checked the codes: https://github.com/sgl-project/sglang/blob/main/python/sglang/bench_serving.py#L99

time.perf_counter() returns microseconds on my system, which is A800-SXM4-40GB. Perhaps the return value of this function varies on different systems. We'd better use time.perf_counter_ns() to return ns definitely to fix this bug, as the docs says: https://docs.python.org/3/library/time.html#time.perf_counter

Reproduction

client side:

python -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 3000 --random-input 1024 --random-output 1024 --random-range-ratio 0.5

server side: just start serving any 72B or similar llms with latest sglang

Environment

A800-SXM4-40GB * 8 gpus for both server and client.

# python3 -m sglang.check_env                                                                                                                                                                                                        
Python: 3.10.14 (main, Apr  6 2024, 18:45:05) [GCC 9.4.0]                                                                                              
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA A800-SXM4-40GB                                                                                                                                                              
GPU 0,1,2,3,4,5,6,7 Compute Capability: 8.0                                                                                                                                                             
CUDA_HOME: /usr/local/cuda                                                                                                                                                                              
NVCC: Cuda compilation tools, release 12.4, V12.4.131                                                                                                                                                   
CUDA Driver Version: 535.86.10                                                                                                                                                                          
PyTorch: 2.4.0+cu121                                                                                                                                                                                    
sglang: 0.3.0                                                                                                                                                                                           
flashinfer: 0.1.6+cu124torch2.4                                                                                                                                                                         
triton: 3.0.0                                                                                                                                                                                           
transformers: 4.44.2                                                                                                                                                                                    
requests: 2.32.3                                                                                                                                                                                        
tqdm: 4.66.5                                                                                                                                                                                            
numpy: 1.26.4                                                                                                                                                                                           
aiohttp: 3.10.5                                                                                                                                                                                         
fastapi: 0.112.2                                                                                                                                                                                         
hf_transfer: 0.1.8                                                                                                                                                                                       
huggingface_hub: 0.24.6                                                                                                                                                                                  
interegular: 0.3.3                                                                                                                                                                                       
packaging: 24.1                                                                                                                                                                                         
PIL: 10.4.0                                                                                                                                                                                             
psutil: 6.0.0                                                                                                                                                                                                                                                      
pydantic: 2.8.2                                                                                                                                                                                                                                                    
uvicorn: 0.30.6                                                                                                                                                                                                                                                    
uvloop: 0.20.0                                                                                                                                                                                                                                                     
zmq: 26.2.0                                                                                                                                                                                              
vllm: 0.5.5                                                                                                                                                                                              
multipart: 0.0.9                                                                                                                                                                                                                                                   
openai: 1.43.0                                                                                                                                                                                                                                                     
anthropic: 0.34.1                                                                                                                                                                                                                                                  
NVIDIA Topology:                                                                                                                                                                                                                                                   
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    CPU Affinity    NUMA Affinity   GPU NUMA ID                                                                                                        
GPU0     X      NV8     NV8     NV8     NV8     NV8     NV8     NV8     PXB     NODE    SYS     SYS     NODE    0-27,56-83      0               N/A                                                                                                                
GPU1    NV8      X      NV8     NV8     NV8     NV8     NV8     NV8     PXB     NODE    SYS     SYS     NODE    0-27,56-83      0               N/A                                                                                                                                                            
GPU2    NV8     NV8      X      NV8     NV8     NV8     NV8     NV8     NODE    PXB     SYS     SYS     NODE    0-27,56-83      0               N/A                                                                                                                                                            
GPU3    NV8     NV8     NV8      X      NV8     NV8     NV8     NV8     NODE    PXB     SYS     SYS     NODE    0-27,56-83      0               N/A                                                                                                                                                            
GPU4    NV8     NV8     NV8     NV8      X      NV8     NV8     NV8     SYS     SYS     PXB     NODE    SYS     28-55,84-111    1               N/A                                                     
GPU5    NV8     NV8     NV8     NV8     NV8      X      NV8     NV8     SYS     SYS     PXB     NODE    SYS     28-55,84-111    1               N/A                                                                                                                
GPU6    NV8     NV8     NV8     NV8     NV8     NV8      X      NV8     SYS     SYS     NODE    PXB     SYS     28-55,84-111    1               N/A                                                                                                                
GPU7    NV8     NV8     NV8     NV8     NV8     NV8     NV8      X      SYS     SYS     NODE    PXB     SYS     28-55,84-111    1               N/A                                                                                                                                                            
NIC0    PXB     PXB     NODE    NODE    SYS     SYS     SYS     SYS      X      NODE    SYS     SYS     NODE                                                                                                                                                                                                   
NIC1    NODE    NODE    PXB     PXB     SYS     SYS     SYS     SYS     NODE     X      SYS     SYS     NODE                                                                                                                                                                                                   
NIC2    SYS     SYS     SYS     SYS     PXB     PXB     NODE    NODE    SYS     SYS      X      NODE    SYS                                                                                              
NIC3    SYS     SYS     SYS     SYS     NODE    NODE    PXB     PXB     SYS     SYS     NODE     X      SYS                                                                                                                                                                                                    
NIC4    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     NODE    NODE    SYS     SYS      X                                                                                                                                                                                                     

Legend:                                                                                                                                                

  X    = Self                                                                                                                                          
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)                                                                                                                                                                                                         
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node                                                                                                                                                                                                   
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)                                                                  
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)                                                         
  PIX  = Connection traversing at most a single PCIe bridge                                                                                            
  NV#  = Connection traversing a bonded set of # NVLinks                                                                                                                                                                                                           

NIC Legend:                                                                

  NIC0: mlx5_2                                                             
  NIC1: mlx5_3                                                             
  NIC2: mlx5_4                                                                                                                                                                                                                                                                                                 
  NIC3: mlx5_5                                                             
  NIC4: mlx5_bond_0                                                        

ulimit soft: 1048576      
merrymercy commented 1 month ago

The doc says time.perf_counter() "Return the value (in fractional seconds)". Am I missing something?

zeng-zc commented 1 month ago

The doc says time.perf_counter() "Return the value (in fractional seconds)". Am I missing something?

Oh..I'm sorry, the doc is right. I write a little test code:

# Python program to show time by perf_counter() 
import time

# Start the stopwatch / counter
t1_start = time.perf_counter() 

time.sleep(3)

# Stop the stopwatch / counter
t1_stop = time.perf_counter()

print("Elapsed time:", t1_stop, t1_start) 

print("Elapsed time during the whole program in seconds:",
                                        t1_stop-t1_start)

and the output:

Elapsed time: 25354205.356015064 25354202.35303321
Elapsed time during the whole program in seconds: 3.0029818527400494

So the benchmark output is right. What an amazing thingthe the latency is so big (mean ~800s) when processing 3000 requests...

Sorry, it's my fault, please close this issue.

zeng-zc commented 1 month ago

It's not a bug.