Previous rules allowed stepped LR. Are they still permitted?
Is cosine LR permitted?
There is an end_learning_rate of 1e-4 as a constraint. Shouldnt this depend on batch size? Is there a reason that end-LR is fixed for SGD-M. Shouldnt it be a fraction of the starting LR?
Is there a document anywhere with a comprehensive list of changes in RN50 rules from earlier submissions.
The RN50 rules are not clear about the following: