mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
93 stars 66 forks source link

Number of Runs for v0.7 #297

Closed bitfort closed 4 years ago

bitfort commented 4 years ago

The current proposal is:

Number of runs for v0.7. Resnet - 5 Maskrcnn - 5 SSD - 5 Transformer - 10 GNMT - 10 Minigo - 10 (previously 20) DLRM - 5 (NEW) BERT - 10 (NEW)

Please review for decision next week.

bitfort commented 4 years ago

AI(Everyone) - Review DLRM number of runs.

frank-wei commented 4 years ago

Can we list the steps of validation needed for BERT and DLRM? In DLRM, how many validation steps we need? Is it related to batch size? In BERT, how many validations we need? 1 per epoch?

christ1ne commented 4 years ago

We have agreement on DLRM for 5 runs over email.

bitfort commented 4 years ago

SWG:

Number of runs for v0.7. Resnet - 5 Maskrcnn - 5 SSD - 5 Transformer - 10 GNMT - 10 Minigo - 10 DLRM - 5 BERT - 10