mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
93 stars 66 forks source link

hyperparameter borrowing and software stealing rules #416

Closed johntran-nv closed 3 years ago

christ1ne commented 3 years ago

Can we add one line saying the borrowing can happen at most once to prevent the infinite loop of borrowing each others HP between 2 submitters? Is this a real concern?

petermattson commented 3 years ago

John Just realized SW borrowing language is also in this CL. Maybe discuss to make sure everyone is aware? (Unless we did and I missed it.) Best, Peter

On Thu, Feb 25, 2021 at 9:32 AM johntran-nv notifications@github.com wrote:

@johntran-nv commented on this pull request.

In training_rules.adoc https://github.com/mlcommons/training_policies/pull/416#discussion_r583028391 :

@@ -352,7 +352,14 @@ OPEN: Hyperparameters and optimizer may be freely changed.

=== Hyperparameter Borrowing

-During the review period as described in the Submission Rules, a submitter may replace the hyperparameters in their implementation of a benchmark with hyperparameters from another submitter's implementation of the same benchmark. By default, they may or may not replace batch size but must replace all other hyperparameters as a group. With evidence that the resulting model converges worse in terms of epochs required (taking into account batch size and precision) they may make a minimum number of additional hyperparameter changes in order to achieve comparable convergence in epochs. +Submitters are expected to use their best efforts to submit with optimal hyperparameters for their system. The intent of Hyperparameter Borrowing is to allow a submitter to update their submission to reflect what they would have submitted had they known about more optimal hyperparameters before submitting, without knowing any other info (ie the performance of other submissions). + +During the review period as described in the Submission Rules, a submitter may replace the hyperparameters in their implementation of a benchmark with hyperparameters from another submitter's implementation of the same benchmark. By default, they may change batch size (local batch size, global batch size, batchnorm span), but must replace all other hyperparameters as a group. + +With evidence that the resulting model, using the same batch size as the other submitter's implementation, converges worse in terms of epochs required, the submitter may make a minimum number of additional hyperparameter changes for the purpose of improving convergence and achieving comparable, but not better, convergence in epochs compared to the other submitter's implementation, but preserving any difference in convergence that may exist due to precision choices.

Consider removing paragraph at line 359.

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/mlcommons/training_policies/pull/416#pullrequestreview-598824994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIVUHLNEN7Z6JPAEAUGWN3TA2CTJANCNFSM4WCVEVDQ .

johntran-nv commented 3 years ago

@petermattson I believe we did discuss in the Training WG. How about I send it out to the Training email alias to give folks another chance to chime in, if they want, and if no one objects in a couple of days, we can merge?

petermattson commented 3 years ago

Sounds good.

On Fri, Feb 26, 2021 at 2:22 PM johntran-nv notifications@github.com wrote:

@petermattson https://github.com/petermattson I believe we did discuss in the Training WG. How about I send it out to the Training email alias to give folks another chance to chime in, if they want, and if no one objects in a couple of days, we can merge?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mlcommons/training_policies/pull/416#issuecomment-786924945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIVUHPZM5W3FWR3Y3UJ2T3TBANLTANCNFSM4WCVEVDQ .

github-actions[bot] commented 3 years ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

tjablin commented 3 years ago

How many days after the submission deadline do submitters have to borrow software?

johntran-nv commented 3 years ago

@christ1ne (or anyone else who wants), I believe I've addressed all the feedback, but want another set of eyes on it before I merge. Could you please re-review?

johntran-nv commented 3 years ago

@christ1ne , @DilipSequeira , are we good to merge now?

DilipSequeira commented 3 years ago

lgtm

christ1ne commented 3 years ago

LGTM