Hi,

optimizer

parser.add_argument("--optimizer", type=str, default="sgd", help="sgd | adam") parser.add_argument("--learning_rate", type=float, default=1.0, help="Learning rate. Adam: 0.001 | 0.0001") parser.add_argument("--warmup_steps", type=int, default=0, help="How many steps we inverse-decay learning.") parser.add_argument("--warmup_scheme", type=str, default="t2t", help="""\ How to warmup learning rates. Options include: t2t: Tensor2Tensor's way, start with lr 100 times smaller, then exponentiate until the specified lr.\ """)

Is the learning rate set to 1 when training the model, is it a bit too big? Why is the learning rate output when training the model always 1? I don't understand how this learning rate decays. Looking forward to your advice or answers. Best regards,

tensorflow / nmt

How is the learning rate decays #304

optimizer