Open alexshepard opened 9 months ago
we do staircase learning rate decay, but might make more sense to do learning rate warmup and cosine decay using https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/CosineDecay
on smaller variants of the iNat datasets it's been shown to improve overall accuracy.
we do staircase learning rate decay, but might make more sense to do learning rate warmup and cosine decay using https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/CosineDecay
on smaller variants of the iNat datasets it's been shown to improve overall accuracy.