Can inner update apply Advanced optimizer such as Adam/RMSprop?

tristandeleu / pytorch-maml-rl

Reinforcement Learning with Model-Agnostic Meta-Learning in Pytorch

MIT License

827 stars 158 forks source link

Hi, I found your code is very readable and elegant, Thanks. I have a question when implementing MAML, when do inner update:

        for (name, param), grad in zip(self.named_parameters(), grads):
            updated_params[name] = param - step_size * grad

fast_weights = list(map(lambda p: p[1] - self.train_lr * p[0], zip(grad, fast_weights)))

While in outer update: we can use simple SGD still,

for (name, param), grad in zip(self.named_parameters(), grads):
            updated_params[name] = param - step_size * grad

 meta_op.backward()
 adam_optim.step()

My question is : Can the inner update use Adam/RMSprop? why. will it corrupt computation graph?

There are a couple of reasons why we don't use a more complex update in the inner loop in MAML:

If you are using a single update only, as we are doing here, using an adaptive learning rate is not worth it since you won't be updating your statistics for something like the Adam/RMSProp update. If you want to learn the step-size, have a look at meta-SGD.
Specifically for Pytorch, you have to use create_graph=True in order to compute higher order derivatives for the meta-update. However, as far as I know, there is no such option in .backward(), so I think you can't really use that in association with a Pytorch optimizer.

That being said, this is specific to MAML where you want to compute the full update. If you are using the first-order approximation of MAML (as described in the paper), where you don't have to compute higher order derivatives, then you can perfectly use what you are describing.

tristandeleu / pytorch-maml-rl

Can inner update apply Advanced optimizer such as Adam/RMSprop? #8