Closed zachluo closed 4 years ago
Reptile benefits from the Adam second moment estimate, but the momentum part of Adam actually hurt performance. Thus we disable momentum but keep the second moment for test time.
Btw, MAML uses vanilla SGD for test time training, which makes it perform worse than it could if it used Adam. See implicit MAML, which fixes this problem and beats Reptile performance.
@unixpickle Thanks for pointing out such an interesting point.
It seems like the training and evaluation share the same model.minimize_op. If so, the evaluation will use the moving average in the optimizer when conducting finetuning. In the original codes of MAML, the evaluation codes will just restore trainable parameters excluding other stastics in the optimizer. Will it make any difference?