Why is the difference between the offline performance and the single stream performance for RNNT so big?

mlcommons / inference_results_v1.1

This repository contains the results and code for the MLPerf™ Inference v1.1 benchmark.

https://mlcommons.org/en/inference-datacenter-11/

Apache License 2.0

11 stars 23 forks source link

Why is the difference between the offline performance and the single stream performance for RNNT so big? #8

Open vid2022 opened 2 years ago

vid2022 commented 2 years ago

Hi,

I have noticed that the difference between the "offline" and "single stream" performance is a lot higher for RNN-T than for the other benchmarks.

For example in submission "1.1-100" from NVIDIA the ratio between the "offline" sample/s to the calculated "single stream" sample/s for all nets except RNNT is lower than 20, but for RNNT it is ca. 292.

Any help in clarifying this difference is appreciated.

Thank you!