Normalize Energy when perf. is normalized to RCP-mean

mlcommons / logging

MLPerf™ logging library

https://mlcommons.org/en/groups/best-practices-benchmark-infra

Apache License 2.0

29 stars 46 forks source link

Normalize Energy when perf. is normalized to RCP-mean #342

Closed nv-rborkar closed 5 months ago

nv-rborkar commented 9 months ago

In MLPerf Training, performance is sometimes normalized based on a scaling factor in scenarios such as:

Test passes RCP check but is faster than RCP-mean (automatically done by repo checker)
Reviewers WG decides to normalize a score to RCP-mean for any reasons such as RCP failures

Energy (perf *power) should also be normalized in such cases.

github-actions[bot] commented 9 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

nv-rborkar commented 9 months ago

@sgpyc can you review this PR and provide feedback in comments?

erichan1 commented 9 months ago

Just to clarify, we're scaling the power score based on scaling factor of number of steps to train? Not saying that there shouldn't be a scaling factor, but it's not obvious to me that this is the right way to do it.

nv-rborkar commented 9 months ago

We are scaling energy score (time*power) when time gets scaled by a scaling factor. RCP normalization is like penalizing a score which was fast, Energy should get penalized similarly as well.

This scenario was not discussed by power taskforce. PR has the most logical solution but we can discuss more during review & have power taskforce weigh in as well.

nv-rborkar commented 5 months ago

@pgmpablo157321 can you please review this once to confirm it doesn't break any of your recent changes to include power & perf in result summary.