mlcommons / power-dev

Dev repo for power measurement for the MLPerf™ benchmarks
https://mlcommons.org/en/groups/best-practices-power
Apache License 2.0
16 stars 22 forks source link

Fixes #301, compare avg.power and not the time duration between the ranging and testing runs #311

Closed arjunsuresh closed 1 year ago

github-actions[bot] commented 1 year ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

arjunsuresh commented 1 year ago

This PR changes the time duration check between the ranging and testing modes to a warning and instead adds a avg_power delta check between the ranging and testing runs. Running this on the 3.0 submissions this correctly captures all the problematic runs (with power factor < 0.5) which are as follows.

[2023-05-27 22:41:51,210 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 10.03 with avg. ranging power 12.64605, avg.testing power 11.37794, avg. ranging power factor 0.46866 and avg. testing power factor 0.45852
[2023-05-27 22:41:51,318 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 10.03 with avg. ranging power 12.64605, avg.testing power 11.37794, avg. ranging power factor 0.46866 and avg. testing power factor 0.45852
[2023-05-27 22:41:51,416 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 10.03 with avg. ranging power 12.64605, avg.testing power 11.37794, avg. ranging power factor 0.46866 and avg. testing power factor 0.45852
[2023-05-27 22:42:22,225 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 11.54 with avg. ranging power 25.6165, avg.testing power 22.65936, avg. ranging power factor 0.32886 and avg. testing power factor 0.34153
[2023-05-27 22:42:22,599 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 17.97 with avg. ranging power 33.31222, avg.testing power 27.3272, avg. ranging power factor 0.34605 and avg. testing power factor 0.36852
[2023-05-27 22:42:23,292 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 17.97 with avg. ranging power 33.31222, avg.testing power 27.3272, avg. ranging power factor 0.34605 and avg. testing power factor 0.36852
[2023-05-27 22:42:23,761 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 15.38 with avg. ranging power 30.59485, avg.testing power 25.88963, avg. ranging power factor 0.34171 and avg. testing power factor 0.36201
[2023-05-27 22:42:23,894 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 14.67 with avg. ranging power 28.38532, avg.testing power 24.22208, avg. ranging power factor 0.33281 and avg. testing power factor 0.3509
[2023-05-27 22:42:24,071 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 6.6 with avg. ranging power 21.24838, avg.testing power 19.84668, avg. ranging power factor 0.3191 and avg. testing power factor 0.32579
[2023-05-27 22:42:24,404 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 17.47 with avg. ranging power 32.95002, avg.testing power 27.19334, avg. ranging power factor 0.34518 and avg. testing power factor 0.3681
[2023-05-27 22:42:24,785 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 6.46 with avg. ranging power 22.11432, avg.testing power 20.68572, avg. ranging power factor 0.32608 and avg. testing power factor 0.33329
[2023-05-27 22:42:24,912 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 15.66 with avg. ranging power 30.25339, avg.testing power 25.51615, avg. ranging power factor 0.33882 and avg. testing power factor 0.35856
[2023-05-27 22:42:25,036 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 11.65 with avg. ranging power 25.45326, avg.testing power 22.48787, avg. ranging power factor 0.32453 and avg. testing power factor 0.34053
[2023-05-27 22:42:25,444 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 17.97 with avg. ranging power 33.31222, avg.testing power 27.3272, avg. ranging power factor 0.34605 and avg. testing power factor 0.36852
[2023-05-27 22:42:26,133 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 17.97 with avg. ranging power 33.31222, avg.testing power 27.3272, avg. ranging power factor 0.34605 and avg. testing power factor 0.36852
[2023-05-27 22:44:42,639 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.1 with avg. ranging power 5.25504, avg.testing power 4.98719, avg. ranging power factor 0.44714 and avg. testing power factor 0.45423
[2023-05-27 22:44:42,783 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.1 with avg. ranging power 5.25504, avg.testing power 4.98719, avg. ranging power factor 0.44714 and avg. testing power factor 0.45423
[2023-05-27 22:44:43,067 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.1 with avg. ranging power 5.25504, avg.testing power 4.98719, avg. ranging power factor 0.44714 and avg. testing power factor 0.45423
[2023-05-27 22:44:44,698 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.87 with avg. ranging power 4.86738, avg.testing power 4.58189, avg. ranging power factor 0.42618 and avg. testing power factor 0.43513
[2023-05-27 22:44:44,873 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.87 with avg. ranging power 4.86738, avg.testing power 4.58189, avg. ranging power factor 0.42618 and avg. testing power factor 0.43513
[2023-05-27 22:44:45,033 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.87 with avg. ranging power 4.86738, avg.testing power 4.58189, avg. ranging power factor 0.42618 and avg. testing power factor 0.43513
[2023-05-27 22:45:03,432 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.22 with avg. ranging power 5.02207, avg.testing power 4.76016, avg. ranging power factor 0.43979 and avg. testing power factor 0.44758
[2023-05-27 22:45:03,593 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.22 with avg. ranging power 5.02207, avg.testing power 4.76016, avg. ranging power factor 0.43979 and avg. testing power factor 0.44758
[2023-05-27 22:45:03,795 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.22 with avg. ranging power 5.02207, avg.testing power 4.76016, avg. ranging power factor 0.43979 and avg. testing power factor 0.44758
[2023-05-27 22:45:51,666 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.59 with avg. ranging power 3.50346, avg.testing power 3.30767, avg. ranging power factor 0.32449 and avg. testing power factor 0.31663
[2023-05-27 22:45:51,788 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.59 with avg. ranging power 3.50346, avg.testing power 3.30767, avg. ranging power factor 0.32449 and avg. testing power factor 0.31663
[2023-05-27 22:45:51,902 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 5.59 with avg. ranging power 3.50346, avg.testing power 3.30767, avg. ranging power factor 0.32449 and avg. testing power factor 0.31663
[2023-05-27 22:45:58,134 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 17.7 with avg. ranging power 4.73192, avg.testing power 3.89438, avg. ranging power factor 0.3675 and avg. testing power factor 0.3418
[2023-05-27 22:45:58,274 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 17.7 with avg. ranging power 4.73192, avg.testing power 3.89438, avg. ranging power factor 0.3675 and avg. testing power factor 0.3418
[2023-05-27 22:45:58,418 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 17.7 with avg. ranging power 4.73192, avg.testing power 3.89438, avg. ranging power factor 0.3675 and avg. testing power factor 0.3418
[2023-05-27 22:46:14,822 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 6.48 with avg. ranging power 4.0466, avg.testing power 3.78426, avg. ranging power factor 0.34751 and avg. testing power factor 0.33814
[2023-05-27 22:46:15,061 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 6.48 with avg. ranging power 4.0466, avg.testing power 3.78426, avg. ranging power factor 0.34751 and avg. testing power factor 0.33814
[2023-05-27 22:46:15,294 power_checker.py:739 ERROR]    Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 6.48 with avg. ranging power 4.0466, avg.testing power 3.78426, avg. ranging power factor 0.34751 and avg. testing power factor 0.33814
psyhtest commented 1 year ago

Many thanks, @arjunsuresh. A couple of questions.

Why some values are repeated? Are those the cases when Offline and MultiStream are inferred from SingleStream?

[2023-05-27 22:41:51,210 power_checker.py:739 ERROR] Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 10.03 with avg. ranging power 12.64605, avg.testing power 11.37794, avg. ranging power factor 0.46866 and avg. testing power factor 0.45852
[2023-05-27 22:41:51,318 power_checker.py:739 ERROR] Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 10.03 with avg. ranging power 12.64605, avg.testing power 11.37794, avg. ranging power factor 0.46866 and avg. testing power factor 0.45852
[2023-05-27 22:41:51,416 power_checker.py:739 ERROR] Avg. power delta between the ranging and testing mode run is > 5%. Observed delta is 10.03 with avg. ranging power 12.64605, avg.testing power 11.37794, avg. ranging power factor 0.46866 and avg. testing power factor 0.45852

In this particular case,

Observed delta is 10.03 with avg. ranging power 12.64605, avg.testing power 11.37794

The delta is: 12.64605 - 11.37794 = 1.26811.

It seems you mean the delta in percent: 1.26811 / 12.64605 = 0.100277 = 10.03%

I would probably show the difference as negative to indicate that the measured average power has decreased, not increased from ranging to testing:

11.37794 / 12.64605 - 1 = -0.100277 = -10.03%

arjunsuresh commented 1 year ago

Thank you @psyhtest for your feedback. Yes, the repeated values are for the inferred scenarios and models.

Yes, delta is in percentage. I have added % now in the output. I agree, there should be a negative sign too.

But should we also consider the scenario and do a delta of Joules/inference between the ranging and testing runs?

arjunsuresh commented 1 year ago

Hi @psyhtest , is the below message fine?

[2023-05-28 14:14:46,709 power_checker.py:741 ERROR]    Average power during the testing mode run is lower than that during the ranging run by more than 5%. Observed delta is -17.97% with avg. ranging power 33.31222, avg.testing power 27.3272, avg. ranging power factor 0.34605 and avg. testing power factor 0.36852

I have added the support for power efficiency delta calculation between the ranging and testing runs in the inference submission checker as it needs the knowledge of a given scenario. It is still not added as a check.

arjunsuresh commented 1 year ago

Hi @psyhtest we are no longer pursuing this change. Please feel free to open and merge if you think it is useful.