Open DilipSequeira opened 4 years ago
I am broadly supportive of this change. In a perfect world, we would do this at the same time we switch to loadgen-over-network, but I don't think we have time this round for loadgen-over-network. For consistency, should we time all pre-processing?
Proposal:
WG comments:
This is for data center only, under the theory that data center inputs are usually compressed. OTOH, edge inputs are often raw.
Also, pre-processing should be added to all benchmarks (already present in RNN-T, none needed in many other benchmarks).
This is a good topic for more discussion.
As Scott pointed it, this is unfair for inference only chip vendors. It's really hard to interpret a result with third party image decompressor playing huge role if preprocessing is timed. Only GPU have the compatibility to handle both at the same time, so if MLPerf promote this, does this mean MLPerf WG prefer GPU over inference only chips?
Only GPU have the compatibility to handle both at the same time, so if MLPerf promote this, does this mean MLPerf WG prefer GPU over inference only chips?
MLPerf ought to reward good designs. Image decompression is an important part of inference for many workloads. It is appropriate that better architectures with more capabilities have higher performance as measured by MLPerf.
It's really hard to interpret a result with third party image decompressor playing huge role if preprocessing is timed.
The only performance that benefits customers is end-to-end performance. If a chip is decompression limited, it is misleading to publish numbers that ignore this limitation. MLPerf ought to publish performance numbers that most clearly reflect real world performance. There are already high performance open source image decompression libraries. It is unlikely anyone will get an advantage by optimizing them. Submitters with dedicated hardware for image decoding ought to be rewarded for their ingenuity.
Hopefully, measuring preprocessing time will guide submitters toward measuring systems with realistic balances of decompression and inference capacity.
WG: will update with survey results next week. If no consensus from submitters, we will rely on the future vision advisory board. Scott suggested another end-to-end benchmark to track the system including networking card, graphics, and include their cost somehow.
@TheKanter will follow up on data center specific submitters.
I think we are out of time to land this for 1.0. I propose merging with Loadgen over network and aim for Inference 1.1. Dilip, what do you think?
Agreed on all counts.
For the March '21 round (1.0?) we would like to see consideration of more timed preprocessing in the datacenter scenarios. specifically for the image models and for 3D-UNet. For edge, it makes sense that the submitter gets to choose the format because it's often coming in from a camera pipeline, but for datacenter it will typically be some form of compressed data (e.g. jpeg).
Let's discuss in the WG.