Open LEEYOONHYUNG opened 3 years ago
Hi. Thank you for your implementation, and I have a question about the optimizer. It seems that you use Adam optimizer with lr=1e-3 and amsgrad=True.
Why you choose the options especially the learning rate, even though the original paper says that they train their model with lr=1e-4.
Did it fail to train your model with lr=1e-3 or amsgrad=False?
Hi. Thank you for your implementation, and I have a question about the optimizer. It seems that you use Adam optimizer with lr=1e-3 and amsgrad=True.
Why you choose the options especially the learning rate, even though the original paper says that they train their model with lr=1e-4.
Did it fail to train your model with lr=1e-3 or amsgrad=False?