Closed zhanyuanyang closed 3 years ago
hi generally the parameters to adjust are lr scheduler, such as decay step and decay strength. I have recently found that using cosine lr scheduler without annealing is good as no hyper-parameters about lr scheduler is needed any more, and you only need to find a good start lr.
What are the best hyperparameters for pre-training on TieredImageNet?
Should I keep the hyperparameters in the train_pretrain.py unchanged?