Open BestKuan opened 4 months ago
chunked_prefill_enable = False
INFO 09-01 12:46:11 async_llm_engine.py:268] 7cbe74f5c90c4a95954ae8b87d36a3c6 finished E2E: 0.29664182662963867, TTFT: 0.29621362686157227, TBT: 0.00042819976806640625, TIQ: 0.001392364501953125 INFO 09-01 12:46:15 async_llm_engine.py:268] 9bbc02b5dc904963a915612fc8951d0a finished E2E: 0.29630255699157715, TTFT: 0.2959132194519043, TBT: 0.00038933753967285156, TIQ: 0.0011632442474365234
chunked_prefill_enable = True INFO 09-01 12:52:55 async_llm_engine.py:268] f4ce2ce1237146b79df1e698d6d70582 finished E2E: 0.3303070068359375, TTFT: 0.32995128631591797, TBT: 0.00035572052001953125, TIQ: 0.0012929439544677734 INFO 09-01 12:53:00 async_llm_engine.py:268] b03a99b525da4bfd8ef6ef1928030a6b finished E2E: 0.3486812114715576, TTFT: 0.3483591079711914, TBT: 0.00032210350036621094, TIQ: 0.0012426376342773438
when enable the chunked prefill, TTFT 296ms -> 330ms
me too!
I tested vllm benchmark_throughput.py and finded that the performance with chunked-prefill-enabled is lower than default, how can I deal this problem
No response
Your current environment (if you think it is necessary)
Does the params set ok?