mlc-ai / llm-perf-bench

Apache License 2.0
109 stars 12 forks source link

tok/sec metric is not clearly defined #7

Closed JohannesGaessler closed 1 year ago

JohannesGaessler commented 1 year ago

I think the tok/sec metric used in the README needs to be defined more clearly. It's not clear whether it for example measures the rate of tokens during generation only (e.g. llama.cpp "eval" print) or the total runtime (e.g. Oobabooge webui print).

junrushao commented 1 year ago

We only care about decoding performance for now. Prefilling is not the bottleneck