Closed dragen1860 closed 6 years ago
There are a couple of reasons why we don't use a more complex update in the inner loop in MAML:
create_graph=True
in order to compute higher order derivatives for the meta-update. However, as far as I know, there is no such option in .backward()
, so I think you can't really use that in association with a Pytorch optimizer.That being said, this is specific to MAML where you want to compute the full update. If you are using the first-order approximation of MAML (as described in the paper), where you don't have to compute higher order derivatives, then you can perfectly use what you are describing.
Hi, I found your code is very readable and elegant, Thanks. I have a question when implementing MAML, when do inner update:
or
While in outer update: we can use simple SGD still,
Or
My question is : Can the inner update use Adam/RMSprop? why. will it corrupt computation graph?