mlcommons / power-dev

Dev repo for power measurement for the MLPerf™ benchmarks
https://mlcommons.org/en/groups/best-practices-power
Apache License 2.0
16 stars 22 forks source link

Request to relax the power checker rules #293

Closed arjunsuresh closed 1 year ago

arjunsuresh commented 1 year ago

We have a rule in power checker which requires the ranging mode and testing mode runs to be of similar duration with a maximum allowed tolerance of 5%. But this 5% tolerance is often not enough for real world systems where the processor clock frequency is not fixed -- system performance can go down when the system reaches thermal limit. For the inference 3.0 submissions we had 2 results where this indeed happened and this issue has more details. To the best of my knowledge the ranging mode is only estimating the max current/voltage ranges and so the requirement for ranging and testing modes to be having similar runtime should never affect the power measurement. If at all this check is needed it must be restricted to the following condition

run time during testing >= 0.95 run time during ranging

and not the other way around. This is because if for some reason the testing mode run is faster, it should be drawing more instantaneous power and the estimated current/voltage ranges during the ranging phase can no longer be valid. But if the testing phase run is indeed slower, there should be no problem in the measurements.