Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?

microsoft / DeepSpeedExamples

Example models using DeepSpeed

Apache License 2.0

6.09k stars 1.04k forks source link

Open goelayu opened 9 months ago

goelayu commented 9 months ago

The mii inferencing benchmark script computes throughput as num_clients/latency. Shouldn't this be num_queries/latency?

Also why use P95 latency and not the total time it took to process all the requests, for the purposes of computing throughput?