Add learning rate warmup

neurostatslab / vocalocator

Deep neural networks for sound source localization and vocalization attribution.

MIT License

2 stars 0 forks source link

Add learning rate warmup #26

Closed Aramist closed 1 year ago

Aramist commented 1 year ago

Models with running statistics (optimizer momentum, batch norm, layer norm) generally benefit from a short period of zero, or linearly ramped learning rate which gives them time to converge onto good estimates of the population statistics without having the initial values throw off the initial weight updates.