stickeritis / sticker

Succeeded by SyntaxDot: https://github.com/tensordot/syntaxdot
Other
25 stars 2 forks source link

Linear lr warmup in pretrain. #153

Closed twuebi closed 4 years ago

twuebi commented 4 years ago

Add linear learning rate warmup to pretrain.


The current implementation already starts decaying during the warmup. I don't think this is a big problem since warmup_steps is usually a very small fraction of the training data.

At some later point, we should probably unify the learning rate scheduling between train and pretrain.