parser.add_argument("--optimizer", type=str, default="sgd", help="sgd | adam")
parser.add_argument("--learning_rate", type=float, default=1.0,
help="Learning rate. Adam: 0.001 | 0.0001")
parser.add_argument("--warmup_steps", type=int, default=0,
help="How many steps we inverse-decay learning.")
parser.add_argument("--warmup_scheme", type=str, default="t2t", help="""\
How to warmup learning rates. Options include:
t2t: Tensor2Tensor's way, start with lr 100 times smaller, then
exponentiate until the specified lr.\
""")
Is the learning rate set to 1 when training the model, is it a bit too big? Why is the learning rate output when training the model always 1? I don't understand how this learning rate decays.
Looking forward to your advice or answers.
Best regards,
Hi,
optimizer
parser.add_argument("--optimizer", type=str, default="sgd", help="sgd | adam") parser.add_argument("--learning_rate", type=float, default=1.0, help="Learning rate. Adam: 0.001 | 0.0001") parser.add_argument("--warmup_steps", type=int, default=0, help="How many steps we inverse-decay learning.") parser.add_argument("--warmup_scheme", type=str, default="t2t", help="""\ How to warmup learning rates. Options include: t2t: Tensor2Tensor's way, start with lr 100 times smaller, then exponentiate until the specified lr.\ """)
Is the learning rate set to 1 when training the model, is it a bit too big? Why is the learning rate output when training the model always 1? I don't understand how this learning rate decays. Looking forward to your advice or answers. Best regards,