mlcommons / power-dev

Dev repo for power measurement for the MLPerf™ benchmarks
https://mlcommons.org/en/groups/best-practices-power
Apache License 2.0
16 stars 24 forks source link

Request to remove the 5% time duration delta check between the ranging and testing modes #301

Closed arjunsuresh closed 1 year ago

arjunsuresh commented 1 year ago

I guess the point of the time duration check is to ensure similar performance between the ranging and testing mode runs. But as of now we have 4 scenarios in inference and 3 of them have early stopping enabled where the runs automatically stops after 10 minutes if proper inputs are provided. In this case even if the performance varies by more than 5%, the checker passes and the below result is an example from 3.0 inference results where this has happened

Ranging mode Testing mode Latency delta Code
125.062243 116.239871 7.59% result

Since the checker is effective only for one of the 4 scenarios - offline - and considering the failure on systems where there are high r2r variations like happened here, I request this test to be removed from the coming rounds.

This is another result where the inferencing during testing mode stopped for 15s which effectively brought the power usage ~1% down but the performance actually went up by 2.87%. If we assume this 15s stoppage did not happen, the time duration delta between the ranging and testing modes will be >5%.

arjunsuresh commented 1 year ago

This is discussed in the inference WG meeting and there are no objections there. Waiting for the power WG.

arjunsuresh commented 1 year ago

If the above check is indeed required, instead of time duration delta, performance per watt delta should be compared and checked to be within X%.