microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
6.09k stars 1.04k forks source link

Throughput should be `num_queries/latency` as opposed to `num_clients/latency`? #858

Open goelayu opened 9 months ago

goelayu commented 9 months ago

The mii inferencing benchmark script computes throughput as num_clients/latency. Shouldn't this be num_queries/latency?

Also why use P95 latency and not the total time it took to process all the requests, for the purposes of computing throughput?