Closed sIncerass closed 4 years ago
Hi @lipiji , thanks for the implementation. One thing I am wondering is that you are using Adagrad instead of Adam (w/ warmup) to have the scores?
@slncerass Yes. Adam is also workable and may obtain better performance with warmup and better lr scheduler.
Hi @lipiji , thanks for the implementation. One thing I am wondering is that you are using Adagrad instead of Adam (w/ warmup) to have the scores?