huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.73k stars 1.01k forks source link

Benchmarker Not Representative Of Real Performance? #1565

Closed mallorbc closed 5 months ago

mallorbc commented 6 months ago

System Info

Docker, A100 GPU

Information

Tasks

Reproduction

Deploy any model and benchmark it. Take note of the throughput

Expected behavior

More so a question. When using the benchmarker, we can control things like the batch size, input size, output size, etc.

However, since TGI uses continuous batching, wouldn't the real-world results be better than the results given by using the model with a given batch size? In other words, if I want a more accurate picture, I should write my own benchmarker?

Or does the benchmarker do continuous batching?

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

vishnumadhu365 commented 2 months ago

would be great if someone from HF could comment on this query. Thanks @OlivierDehaene