mlcommons / power-dev

Dev repo for power measurement for the MLPerf™ benchmarks
https://mlcommons.org/en/groups/best-practices-power
Apache License 2.0
16 stars 24 forks source link

Submission checker : Time difference checker needs to be modified #239

Closed s-idgunji closed 3 years ago

s-idgunji commented 3 years ago

The time difference checker method that checks perf delta between ranging and testing phase has conflicting requirements to Inference WG. An issue was raised in last round where one of the submissions had a difference of > 5% which was the acceptable threshold. It turned out that the testing (measure) phase was slower . If the submitter (NVIDIA) had rerun, then their results could have come within threshold. However the inference WG requires that any updated submission be no more than a fraction of a % off. So these two objectives are in conflict. If the goal is to check that the flow was carried out as expected , then we could relax this check or find another way that allows to meet both the Inference WG objectives and Power WG objectives.

[x] Check client sources checksum

[x] Check server sources checksum

[x ] Check PTD commands and replies

[x] Check UUID

[x] Check session name

[x] Check time difference

[x] Check client server messages

[x] Check results checksum

[x] Check errors and warnings from PTD logs

[x] Check PTD configuration

[x] Check debug is disabled on server-side

s-idgunji commented 3 years ago

Hi Tejus - I do not know if we have any resolution for this , as yet. But clearly we need to modify this as a warning and not a check to re-submit to make these within the acceptable time range, because it may conflict with Inference policy of resubmit needing to be within a very small delta perf (< 1%) from original submission.

Another option could be that an issue to resubmit be filed optionally, if the "performance" phase is faster than "ranging" phase by the given tolerance bound we have in v1.0. Need to check if that is acceptable by Inference policies.

araghun commented 3 years ago

Power WG Discussions:

If Ranging result > Testing result, then power could be higher (likely), then after ranging if the limits are set, then the limits are going to hold. This case is good. No need for resubmissions. The other case will need more scrutiny (resubmissions, etc.). More folks asked to comment here.

s-idgunji commented 3 years ago

I think this is agreed given discussions in Power WG , no need to change code.