stickeritis / sticker

Succeeded by SyntaxDot: https://github.com/tensordot/syntaxdot
Other
25 stars 2 forks source link

Set the default number of warmup steps to 2000 #171

Closed danieldk closed 4 years ago

danieldk commented 4 years ago

Motivation, Ma & Yarats, 2019:

"We conclude by suggesting that practitioners stick to linear warmup with Adam, with a sensible default being linear warmup over 2·(1−β_2)^−1 training iterations."

We use the default Tensorflow Adam hyperparameters, where β_2 = 0.999, 2·(1−0.999)^−1 = 2000.

Fixes #169.

danieldk commented 4 years ago

cc @DiveFish : I saw that you were training a sticker model the other day. You probably want to use this changed default. The improvements for Dutch and German are ~0.6 and 0.5% LAS.

DiveFish commented 4 years ago

Thanks for the pointer!