Measure and report stats that users will experience with different batch sizes, input prompt context lengths, and output generation lengths. For example:
prefill:
TTFT (time to first token)
decode:
throughput (tokens per second, or time per decode) across generation sequence. Throughput changes with context length, report full curve or some key points.
Measure and report stats that users will experience with different batch sizes, input prompt context lengths, and output generation lengths. For example:
prefill:
decode: