As far as I know, the current code of MAML only fits the value function to episodes from one ground of on-policy rollouts during adaptation. But the code here seems to have a bug:
It seems the value function never gets updated. I can confirm that there are gradients for self._value_function but no gradients for inner_algo._value_function:
As far as I know, the current code of MAML only fits the value function to episodes from one ground of on-policy rollouts during adaptation. But the code here seems to have a bug:
https://github.com/rlworkgroup/garage/blob/731101898450da6c64fd6f6cdfc9910e4a7d8ea6/src/garage/torch/algos/maml.py#L176
This loss only backpropagates gradients into
self._value_function
which is a deep copy ofinner_algo._value_function
:https://github.com/rlworkgroup/garage/blob/731101898450da6c64fd6f6cdfc9910e4a7d8ea6/src/garage/torch/algos/maml.py#L67
However, the optimizer is actually optimizing
inner_algo._value_function
instead ofself._value_function
:https://github.com/rlworkgroup/garage/blob/731101898450da6c64fd6f6cdfc9910e4a7d8ea6/src/garage/torch/algos/maml.py#L178
It seems the value function never gets updated. I can confirm that there are gradients for
self._value_function
but no gradients forinner_algo._value_function
: