Benchmarker Not Representative Of Real Performance?

mallorbc commented 6 months ago

System Info

Docker, A100 GPU

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Deploy any model and benchmark it. Take note of the throughput

Expected behavior

More so a question. When using the benchmarker, we can control things like the batch size, input size, output size, etc.

However, since TGI uses continuous batching, wouldn't the real-world results be better than the results given by using the model with a given batch size? In other words, if I want a more accurate picture, I should write my own benchmarker?

Or does the benchmarker do continuous batching?

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

vishnumadhu365 commented 2 months ago

would be great if someone from HF could comment on this query. Thanks @OlivierDehaene

huggingface / text-generation-inference