mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
93 stars 66 forks source link

[HPC] Proposal: Remove open division #510

Open nvaprodromou opened 1 year ago

nvaprodromou commented 1 year ago

Introduction:

After collecting feedback from engineers, clients, and press, NVIDIA presented a list of proposals that aim to improve the popularity of the MLPerf HPC benchmark suite. Please see our slide deck for more information on our feedback gathering process and insights.

Proposal: Remove open division from results table

Slide 12 in proposals slide deck.

We propose to remove the open division from the HPC benchmark because it's not really utilized and it unnecessarily complicates benchmark.

This proposal aims to improve the popularity of the MLPerf HPC benchmark suite by improving on the following aspects:

  1. Simplifies results parsing and understanding [Improves press interest]

Discussion

Pros:

  1. Quick, easy fix
  2. Reduces publics/reporters’ confusion when faced with our results

Cons:

  1. Some reporters have already invested time to understand our current terminology.
    • We believe that reporters will appreciate the clarity and simplicity of the revised terminology.
coquelin77 commented 1 year ago

I think that removing the division entirely is not the best way to go about this. I think that the reason that it is not taken up is because it is very hard to beat optimized code with open method. However, if we want to target HPC research communities more, we could create an 'Emerging Algorithms' division.

This division would not attempt to compared to the optimized runtime, but instead either compare against the runtime of the un-optimized code (without graphs and the like), the number of iterations until completion the world size, the number of training steps global batch size, or something similar.

I think that it is very important to have something less onerous for researchers to compare to. Why would a researcher want to submit their new algorithm to the open section if they will only look bad? Furthermore, how many researchers have time to fully optimize their code to the level required for something like this?

I agree that the current formulation of the open division primarily adds to confusion, but I feel that opening up a place for those who want to showcase new ways of training NNs at scale would make MLPerf HPC stand out much more from Training than only the UCs.

sparticlesteve commented 1 year ago

My comments on this one

nvaprodromou commented 1 year ago

Thanks for the feedback. I'll respectfully disagree with a couple of the points made, but first, I do agree that it's valuable as a fallback for failed submissions in some cases. In combination with other proposals, the fallback can become the submission of the previous round (since some proposals allow for older results to carry on).

(1) Open division hurts us due to the added complexity that deters the public from focusing on the main part of the benchmark suite (closed division). The added complexity is not only in understanding its rules, but also in understanding its reason for existence and most importantly, it adds two entire tabs to the results table. Try to think about this as somebody who just wants to check out the latest MLPerf-HPC results after seeing a post on twitter about it (presumably they are also on a mobile device), but doesn't necessarily have a strong grasp of the policies (this is the average person and many reporters). The added complexity is a deterrent and will have them leave the site quickly and frustrated, without having gained much insight. Perhaps none. We can't expect the audience to spend days understanding the policies just to get a quick glimpse of the results.

(2) If we are not comparing algorithmic approaches with the state-of-the-art performance, then what's the point of keeping it within the same benchmark suite? I (sort of) understand the value, but why does it need to be displayed along the "golden" results and overpopulate an already-hard-to-read results table? Also, updating the rules to add exceptions on what you can compare open division results with, further adds complexity, making the issue worse.

To summarize, I do think it hurts us a lot to have the open division but not because it's inherently bad. Just because it distracts the audience. The problem could be solved with a good redesign of the results table presentation, but even then, open division will be an annoyance in almost all cases.

coquelin77 commented 1 year ago

I agree that the fallback could be the previous submission, this is something that I fully support.

I also agree that the added complexity of the open division hurts us, and I agree with why. However I disagree with the solution; not because it wouldn't work, but because I believe it would reduce the number of people checking out the results and the number of people motivated to participate.

Regarding the confusion in the rules, if we replace the open division with 'Emerging Algorithms' we can simplify the division greatly: fixed model structure, data-parallel training without data overlap, and validation after each epoch.

The point of keeping something like this in the benchmarking suite is simple: it draws in users and consumers to view the SOTA methods for training networks in HPC settings. If we have a place for researchers to push the boundaries of both accuracy and speed, it draws people to the benchmarking suite because it is not simply speed. How to show the results for this division is something which is open to discussion.

An Emerging Algorithms division allows for researchers to justify submitting the benchmarks in the political climate of publish-or-perish. I have been asked multiple times about the utility of these benchmarks by my bosses and what we have to show for it.

While I think that removing the division would help in some ways, I think its a missed opportunity.

I would like to also note that the open division was what drew my group to the benchmark suite in the first place.

TheKanter commented 1 year ago

Hi Andreas - RE: (1) above, I wouldn't get hung up on the current results display. We are redoing results, so making decisions predicated on the current display isn't a good plan IMO.

From our previous discussions, the open division is actually more essential to HPC than to Training given the breadth of approaches to HPC problems. That was the impression I got from our previous discussions and consensus. Has that changed?

nvaprodromou commented 1 year ago

@TheKanter I'm not just focusing on the results display, but rather the overall overhead open division adds when reviewing/discussing/presenting results. As far as open division itself, nothing has changed really. Its theoretical value is still what it used to be. In practice however, it is not utilized to its theoretical capacity (in fact, it's barely utilized). The net result, is an item that needs to be discussed with press, which does not really add value to the conversation. Instead it only adds complexity. Even if it just takes ten minutes to respond to a reporter question "So, what is open division anyway?", that's ten minutes that don't add value and possibly are enough to lose an audience that's not that interested in the subject to begin with.

The proposal just aims to simplify parsing of our results by the average person and/or reporter. It's certainly not expected to directly affect our problem of limited participation and competition as meaningfully as other proposals. Indirectly however, the reasoning is that if we make it easier for press to understand (1) what the benchmark is and (2) what the results say, then we expect increased interest (and higher quality articles) by press and public, which in turn results in increased participation and competition.

This is a proposal we believe is a step in the right direction but of course the group has the final say.