mlcommons / logging

MLPerf™ logging library
https://mlcommons.org/en/groups/best-practices-benchmark-infra
Apache License 2.0
29 stars 46 forks source link

Normalize Energy when perf. is normalized to RCP-mean #342

Closed nv-rborkar closed 5 months ago

nv-rborkar commented 9 months ago

In MLPerf Training, performance is sometimes normalized based on a scaling factor in scenarios such as:

Energy (perf *power) should also be normalized in such cases.

github-actions[bot] commented 9 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

nv-rborkar commented 9 months ago

@sgpyc can you review this PR and provide feedback in comments?

erichan1 commented 9 months ago

Just to clarify, we're scaling the power score based on scaling factor of number of steps to train? Not saying that there shouldn't be a scaling factor, but it's not obvious to me that this is the right way to do it.

nv-rborkar commented 9 months ago

We are scaling energy score (time*power) when time gets scaled by a scaling factor. RCP normalization is like penalizing a score which was fast, Energy should get penalized similarly as well.

This scenario was not discussed by power taskforce. PR has the most logical solution but we can discuss more during review & have power taskforce weigh in as well.

nv-rborkar commented 5 months ago

@pgmpablo157321 can you please review this once to confirm it doesn't break any of your recent changes to include power & perf in result summary.