Timed preprocessing - Githubissues

DilipSequeira commented 4 years ago

For the March '21 round (1.0?) we would like to see consideration of more timed preprocessing in the datacenter scenarios. specifically for the image models and for 3D-UNet. For edge, it makes sense that the submitter gets to choose the format because it's often coming in from a camera pipeline, but for datacenter it will typically be some form of compressed data (e.g. jpeg).

Let's discuss in the WG.

tjablin commented 4 years ago

I am broadly supportive of this change. In a perfect world, we would do this at the same time we switch to loadgen-over-network, but I don't think we have time this round for loadgen-over-network. For consistency, should we time all pre-processing?

christ1ne commented 4 years ago

Proposal:

what? For datacenter only & for R50, SSD-R34 and 3D UNET: input is now jpeg images, output is the same as before.
when? TBD

TheKanter commented 4 years ago

WG comments:

This is for data center only, under the theory that data center inputs are usually compressed. OTOH, edge inputs are often raw.

Also, pre-processing should be added to all benchmarks (already present in RNN-T, none needed in many other benchmarks).

This is a good topic for more discussion.

aaronzhongii commented 4 years ago

As Scott pointed it, this is unfair for inference only chip vendors. It's really hard to interpret a result with third party image decompressor playing huge role if preprocessing is timed. Only GPU have the compatibility to handle both at the same time, so if MLPerf promote this, does this mean MLPerf WG prefer GPU over inference only chips?

christ1ne commented 4 years ago

need to consider if that will further narrow the pool of potential submitters
better to get use case from the vision advisory board
will hear from David's survey on the v0.7 submitters and non-submitters

tjablin commented 4 years ago

Only GPU have the compatibility to handle both at the same time, so if MLPerf promote this, does this mean MLPerf WG prefer GPU over inference only chips?

MLPerf ought to reward good designs. Image decompression is an important part of inference for many workloads. It is appropriate that better architectures with more capabilities have higher performance as measured by MLPerf.

It's really hard to interpret a result with third party image decompressor playing huge role if preprocessing is timed.

The only performance that benefits customers is end-to-end performance. If a chip is decompression limited, it is misleading to publish numbers that ignore this limitation. MLPerf ought to publish performance numbers that most clearly reflect real world performance. There are already high performance open source image decompression libraries. It is unlikely anyone will get an advantage by optimizing them. Submitters with dedicated hardware for image decoding ought to be rewarded for their ingenuity.

Hopefully, measuring preprocessing time will guide submitters toward measuring systems with realistic balances of decompression and inference capacity.

christ1ne commented 4 years ago

WG: will update with survey results next week. If no consensus from submitters, we will rely on the future vision advisory board. Scott suggested another end-to-end benchmark to track the system including networking card, graphics, and include their cost somehow.

christ1ne commented 3 years ago

@TheKanter will follow up on data center specific submitters.

tjablin commented 3 years ago

I think we are out of time to land this for 1.0. I propose merging with Loadgen over network and aim for Inference 1.1. Dilip, what do you think?

DilipSequeira commented 3 years ago

Agreed on all counts.

mlcommons / inference_policies

Timed preprocessing #184