tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.37k stars 1.96k forks source link

How is the learning rate decays #304

Closed yapingzhao closed 6 years ago

yapingzhao commented 6 years ago

Hi,

optimizer

parser.add_argument("--optimizer", type=str, default="sgd", help="sgd | adam") parser.add_argument("--learning_rate", type=float, default=1.0, help="Learning rate. Adam: 0.001 | 0.0001") parser.add_argument("--warmup_steps", type=int, default=0, help="How many steps we inverse-decay learning.") parser.add_argument("--warmup_scheme", type=str, default="t2t", help="""\ How to warmup learning rates. Options include: t2t: Tensor2Tensor's way, start with lr 100 times smaller, then exponentiate until the specified lr.\ """)

Is the learning rate set to 1 when training the model, is it a bit too big? Why is the learning rate output when training the model always 1? I don't understand how this learning rate decays. Looking forward to your advice or answers. Best regards,

YangFei1990 commented 5 years ago

Have you solved the problem? I met the same issue here