mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.18k stars 518 forks source link

TEST04 update for Inference v3.1 #1350

Open pgmpablo157321 opened 1 year ago

pgmpablo157321 commented 1 year ago

Currently TEST04 consists of running performance mode with only one sample as follows:

This creates the issue that the test doesn’t apply for benchmarks with a datasets with very unbalanced sample input sizes or very unbalanced processes times. Currently the test only applies for ResNet50.

Requirements:

Initial Proposal

Other considerations

This is not the final proposal to fix this test. Please feel free to discuss here other possible ways to solve this issue

arjunsuresh commented 1 year ago

I think we can do something like this. I'm considering retinanet as an example here. Here, performance_sample_count=64 and so take 64 unique inputs.

Repeat inp1 N times, followed by inp2 N times and so on. Let the time taken be t1. (Maximum cache hits)

Then run inp1 followed by inp2 ... inp64 and repeat this N times. Let the time taken be t2. (Minimum cache hits)

Compare t1 and t2 for compliance.

Minimum value of N should be 2 to ensure t1 is benefitting from caching. In the worst case, the runtime will be 4 times the accuracy run time as there are two runs and each run is doing twice the performance_sample_count number of inputs. Submitters should be free to increase the value of N on faster devices as a short runtime can cause larger variation and test failure.

This modification can make TEST04 applicable to all the benchmarks.

pgmpablo157321 commented 1 year ago

Ideally this test can be applicable to: rnnt, bert, dlrm, retinanet