ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs
Apache License 2.0
590 stars 93 forks source link

Improve benchmark tput by moving prompt preparation outside of loop #54

Closed gracehonv closed 3 months ago

gracehonv commented 3 months ago

Moved the prompt making call to randomly_sample_sonnet_lines_prompt outside of load request send loop so that the send loop can generate load to the server faster. Otherwise there's an artificial delay due to making the next prompt which slows down the benchmark throughput/sec. Also changed tokenizer instantiation to just once outside the prompt generation loop to speed up the overall test. After this change I've seen up to 2x improvement in server achieved throughput in some small workloads. This change will allow better measurement of true server throughput.

gracehonv commented 3 months ago

@avnishn or @rickyyx would it be possible to get this PR reviewed? Thank you!