mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

BERT Reference for 1.0 #386

Open bitfort opened 4 years ago

bitfort commented 4 years ago

Tracking for BERT reference in preparing for 1.0

bitfort commented 4 years ago

SWG:

Proposed improvements for BERT reference:

  1. Move the model from TF1 to TF2, with performance bug fixes
  2. Split the training and the evaluation datasets
  3. Enable gradient accumulation
  4. Use online evaluation
  5. Add MLPerf logging

AI(Preview Convergence curves to better understand this change)

johntran-nv commented 3 years ago

Adding PRs: https://github.com/mlcommons/training/pull/435 https://github.com/mlcommons/training/pull/434