Bug with MAML's value function training?

As far as I know, the current code of MAML only fits the value function to episodes from one ground of on-policy rollouts during adaptation. But the code here seems to have a bug:

https://github.com/rlworkgroup/garage/blob/731101898450da6c64fd6f6cdfc9910e4a7d8ea6/src/garage/torch/algos/maml.py#L176

This loss only backpropagates gradients into self._value_function which is a deep copy of inner_algo._value_function:

https://github.com/rlworkgroup/garage/blob/731101898450da6c64fd6f6cdfc9910e4a7d8ea6/src/garage/torch/algos/maml.py#L67

However, the optimizer is actually optimizing inner_algo._value_function instead of self._value_function:

https://github.com/rlworkgroup/garage/blob/731101898450da6c64fd6f6cdfc9910e4a7d8ea6/src/garage/torch/algos/maml.py#L178

It seems the value function never gets updated. I can confirm that there are gradients for self._value_function but no gradients for inner_algo._value_function:

rlworkgroup / garage

Bug with MAML's value function training? #2301