openai / supervised-reptile

Code for the paper "On First-Order Meta-Learning Algorithms"
https://arxiv.org/abs/1803.02999
MIT License
989 stars 210 forks source link

moving average in AdamOptimizer when conducting evaluation #28

Closed zachluo closed 4 years ago

zachluo commented 4 years ago

It seems like the training and evaluation share the same model.minimize_op. If so, the evaluation will use the moving average in the optimizer when conducting finetuning. In the original codes of MAML, the evaluation codes will just restore trainable parameters excluding other stastics in the optimizer. Will it make any difference?

unixpickle commented 4 years ago

Reptile benefits from the Adam second moment estimate, but the momentum part of Adam actually hurt performance. Thus we disable momentum but keep the second moment for test time.

unixpickle commented 4 years ago

Btw, MAML uses vanilla SGD for test time training, which makes it perform worse than it could if it used Adam. See implicit MAML, which fixes this problem and beats Reptile performance.

zachluo commented 4 years ago

@unixpickle Thanks for pointing out such an interesting point.