Closed IIAT-MR-LL closed 4 years ago
@IIAT-MR-LL In our follow-up experiments, reducing LR even with a rough schedule gives us more stable results. Consider to use LR scheduler. We did not make change in the weight decay and other parameters (used default hyper parameters from Adam optimizer except LRs).
@seoungwugoh Thanks for your instruction.
Hi, I want to know some details of the configuration of Adam optimizer. In the paper, as you just mentioned use constant learning rate 1e^{-5}, but did not mention about the weight decay which is also important for optimization. Would you mind sharing with us the hyperparameter setting for Adam optimizer (i.e. weight_decay and betas).
Thanks