What's the role of the parameter coef_lr?

Hi Arrow,
I observed that after the pre-training stage-1, the parameters of BERT had very small changes with the initialization parameters. Is it because parameter coef_lr is working? Since it was set to 0.1 at the 1st stage and set to 1 at the 2nd stage. I guess it's to prevent BERT from being damaged at the beginning of training. https://github.com/microsoft/UniVL/blob/0a7c07f566a3b220731f4abcaa6e1ee59a686596/main_pretrain.py#L383-L385

By the way, you named no_decay_xxx with the decay coefficient, and named decay_xxx without decay coefficient. Are these typoes? https://github.com/microsoft/UniVL/blob/0a7c07f566a3b220731f4abcaa6e1ee59a686596/main_pretrain.py#L191-L194

microsoft / UniVL

What's the role of the parameter coef_lr? #14