openvinotoolkit / openvino.genai

Run Generative AI models using native OpenVINO C++ API
Apache License 2.0
113 stars 151 forks source link

how to understand the benchmark_genai's results matrix? #875

Open aoke79 opened 1 week ago

aoke79 commented 1 week ago

how to understand the report matrix, like TTFT, TPOT, Throughput, etc. which one is the first token latency, and second token average time?

Load time: 18507.00 ms Generate time: 21.88 ± 1.20 ms Tokenization time: 0.28 ± 0.02 ms Detokenization time: 0.94 ± 0.73 ms TTFT: 20.88 ± 0.47 ms TPOT: 20.88 ± 0.47 ms Throughput : 47.89 ± 1.07 tokens/s

Thanks a lot,

Wovchena commented 1 week ago

You can find the description at https://github.com/openvinotoolkit/openvino.genai/blob/6003234f6bbb03cb3f5a3e66f4a4f5ce00cdeb18/src/cpp/include/openvino/genai/perf_metrics.hpp#L55