mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
91 stars 65 forks source link

Clarify BERT Eval Boundary #362

Open bitfort opened 4 years ago

bitfort commented 4 years ago

Proposal:

We seek to clarify how to handle batch-boundaries for BERT Evaluation.

(1) The rules state: "BERT Starting at 3M samples, then every 500K samples"

(2) We believe the following section of the rules also apply: 9.5. Equivalence exceptions "If data set size is not evenly divisible by batch size, one of several techniques may be used. The last batch in an epoch may be composed of the remaining samples in the epoch, may be padded, or may be a mixed batch composed of samples from the end of one epoch and the start of the next. If the mixed batch technique is used, quality for the ending epoch must be evaluated after the mixed batch. If the padding technique is used, the first batch may be padded instead of the last batch."

(3) We believe that the BERT reference evaluates every 499,992 samples (which is every 20833 batches with a batch size of 24). Thus, we believe the reference is "rounding down", which is not inline with our understanding of the batch-boundary rules.

(4) We believe an appropriate way to handle BERT evaluation is to fill the last batch of the eval window using the "mixed batch" approach (i.e. adding extra examples to fill up the last batch). There may be other appropriate ways of handling this.

bitfort commented 4 years ago

SWG:

We believe the reference is not compliant with the rules, but since it is the reference we will allow both the rules and the reference this round. So for this round we will clarify that as long as you are within 1 batch size of 500K your submission is acceptable this round. We will revisit this part of our general re-work of BERT eval for next round.