mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

Include llm training rules #514

Closed ShriyaPalsamudram closed 1 year ago

ShriyaPalsamudram commented 1 year ago

High level list of rule changes

  1. Minimum number of GPUs needed to train an llm reference
  2. LLM metadata preprocessing need not be timed
  3. Add quality metric, dataset, model details, hyperparameters and constraints
  4. Add note on fixed learning rate and restrictions on hparams search
  5. RCP generation exception and timeline
github-actions[bot] commented 1 year ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

nv-rborkar commented 1 year ago

Training WG meeting 02/09: @itayhubara @sgpyc assigned for review

itayhubara commented 1 year ago

Looks good to me I would consider adding that dropout is zero: attention-dropout 0.0, hidden-dropout 0.0

johntran-nv commented 1 year ago

Training WG approved today.