ResNet-50: Clarification on constrains on LR schedules

mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes

https://mlcommons.org/en/groups/training

Apache License 2.0

92 stars 66 forks source link

Open mrinaliyer opened 3 years ago

mrinaliyer commented 3 years ago

The RN50 rules are not clear about the following:

Previous rules allowed stepped LR. Are they still permitted?
Is cosine LR permitted?
There is an end_learning_rate of 1e-4 as a constraint. Shouldnt this depend on batch size? Is there a reason that end-LR is fixed for SGD-M. Shouldnt it be a fraction of the starting LR?
Is there a document anywhere with a comprehensive list of changes in RN50 rules from earlier submissions.