Fix decoding throughput computation

ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs

Apache License 2.0

470 stars 69 forks source link

Closed comaniac closed 2 months ago

comaniac commented 2 months ago

Add a random seed to make the benchmarking reproducible.
Currently it computes decoding throughput by decoding_step / end_to_end_latency. However, one decoding step may generate multiple tokens. This PR updates the decoding throughput after getting the correct number of output tokens.

cc @rickyyx