Closed azrael417 closed 3 years ago
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅
Copying some of my comments here from slack, for posterity:
Using max(end_time) - min(start_time) will mean the result is entirely driven by the slowest model, and is thus highly sensitive to outliers (i.e. bad luck) in training variability. This is of course why we use olympic scoring in the traditional metric reporting. A possible way to mitigate this is to run K models, but drop the last and compute the time using just the K-1 runs.
David had a further suggestion, which I think was along the lines of using the average runtime across the K concurrent runs. This is more resilient to variability. However, I think it would require us to enforce that all jobs actually run at the same time.
More discussion points copied from slack, for posterity:
How exactly do we define a “system”? What level of heterogeneity (partitions with different hardware) do we allow? Do we allow system partitions to be spread out geographically (probably not useful)?
Actually I think this is irrelevant for this metric, since you have to run at the scale you report. So if you can run on all your cloud instances together that should be fine. It is unlikely that someone is going to do that anyway. That means we probably do not need to define what a system is.
Should we impose a minimum (or fixed) batch size? This would allow us some control over runtime and disincentivizing folks from running smallest possible batch size for longest possible system run time (which disadvantages folks who cannot get that amount of system access)
No, imo that should be open. You can also get a large minimum batch size on a single accelerator if you use gradient accumulation for example. I think we should just leave that open.
@johntran-nv could you take a look and merge?
@azrael417 can you remove WIP from the title? I think it is ready to be merged.
Sear Sir/Madam,
this PR contains the rules changes for MLperf HPC concerning the updated performance metrics. Please consider this draft WIP as some details are still being discussed. As such, I encourage members of the MLPerf HPC WG to comment on the PR and refine it before it can ultimately be merged.
Best regards Thorsten