mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

Bert Evaluation before 3M #351

Open bitfort opened 4 years ago

bitfort commented 4 years ago

We choose a to start evaluating BERT at 3M samples under the belief that everyone would converge after 3M samples. We have evidence this is not true. We want to discuss lifting this constraint (in light of our wrong assumptions writing the rules) and evaluating every 500K samples.

jonathan-cohen-nvidia commented 4 years ago

We brought this up two weeks ago and Google specifically objected to evaluating more frequently. The 3M number was proposed by a Google engineer.

I think any change at this point is bad - some companies may have already started their submission runs since there are only 10 days left.

Suggest we defer this to 0.8.

bitfort commented 4 years ago

SWG:

We had a long discussion about BERT evaluation today.

The resolution from this discussion:

We want to make sure we have the following takeaways from this conversation: