mlcommons / power-dev

Dev repo for power measurement for the MLPerf™ benchmarks
https://mlcommons.org/en/groups/best-practices-power
Apache License 2.0
16 stars 22 forks source link

Fixes #288, #289, inference_repo issue number 1335 #298

Closed arjunsuresh closed 1 year ago

github-actions[bot] commented 1 year ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

arjunsuresh commented 1 year ago

I ran the checker in the inference_results_3.0 repository and below are the results.

[2023-05-01 22:28:36,662 submission_checker.py:2651 INFO] ---
[2023-05-01 22:28:36,665 submission_checker.py:2654 ERROR] NoResults closed/Krai/results/firefly-tflite-v2.11.0-ruy/resnet50
[2023-05-01 22:28:36,665 submission_checker.py:2654 ERROR] NoResults closed/Krai/results/firefly-tflite-v2.11.0-ruy/resnet50/multistream
[2023-05-01 22:28:36,665 submission_checker.py:2654 ERROR] NoResults closed/Krai/results/firefly-tflite-v2.11.0-ruy/resnet50/offline
[2023-05-01 22:28:36,665 submission_checker.py:2654 ERROR] NoResults closed/Krai/results/firefly-tflite-v2.11.0-ruy/resnet50/singlestream
[2023-05-01 22:28:36,665 submission_checker.py:2654 ERROR] NoResults open/Krai/results/firefly-tflite-v2.11.0-ruy/mobilenet-v1-1.0-128-non-quantized/multistream
[2023-05-01 22:28:36,665 submission_checker.py:2654 ERROR] NoResults open/Krai/results/firefly-tflite-v2.11.0-ruy/mobilenet-v1-1.0-128-non-quantized/offline
[2023-05-01 22:28:36,666 submission_checker.py:2654 ERROR] NoResults open/Krai/results/firefly-tflite-v2.11.0-ruy/mobilenet-v1-1.0-128-non-quantized/singlestream
[2023-05-01 22:28:36,666 submission_checker.py:2657 INFO] ---
[2023-05-01 22:28:36,666 submission_checker.py:2658 INFO] Results=7277, NoResults=7
[2023-05-01 22:28:36,666 submission_checker.py:2661 ERROR] SUMMARY: submission has errors

Even though 7 results are failed it is actually 2 unique results (others are inferred).

Both are from the same SUT and the uncertainties are happening at the beginning and end of the loadgen testing phase run.

psyhtest commented 1 year ago

Checking the first log, the testing range is set to 0.2 Amps. The warnings have the following timestamps:

Here are the corresponding lines from the testing spl.txt:

dmiskovic-NV commented 1 year ago

TL;DR: If range is kept constant, lower power will have higher uncertainty

Uncertainty is sum of uncertainties that come from: set range for voltage and current (which are not scaled by measured power), measured value, power factor, and few more parasitic effects. In order to get uncertainty in %, value is divided by measured power, so it will naturally increase as power reduces

arjunsuresh commented 1 year ago

Thank you @psyhtest for sharing the details. Actually there is a check to ensure the uncertainty reports are only considered during the loadgen run. And it is just one single sample which is failing this test for both the SUTs.

If I modify the check as follows if start_load_time+TIME_DELTA_TOLERANCE < log_time < stop_load_time-TIME_DELTA_TOLERANCE:

it passes and the TIME_DELTA_TOLERANCE being used is 500ms. Would you recommend committing this change?

psyhtest commented 1 year ago

And it is just one single sample which is failing this test for both the SUTs.

Interesting. Where does this sample occur? When transitioning from idle to busy or vice versa, I guess?

arjunsuresh commented 1 year ago

yes @psyhtest it occured very close to the testing start - within 500ms interval. Just a guess - this could be due to this issue