rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.88k stars 310 forks source link

Bug with MAML's value function training? #2301

Closed chongyi-zheng closed 3 years ago

chongyi-zheng commented 3 years ago

As far as I know, the current code of MAML only fits the value function to episodes from one ground of on-policy rollouts during adaptation. But the code here seems to have a bug:

https://github.com/rlworkgroup/garage/blob/731101898450da6c64fd6f6cdfc9910e4a7d8ea6/src/garage/torch/algos/maml.py#L176

This loss only backpropagates gradients into self._value_function which is a deep copy of inner_algo._value_function:

https://github.com/rlworkgroup/garage/blob/731101898450da6c64fd6f6cdfc9910e4a7d8ea6/src/garage/torch/algos/maml.py#L67

However, the optimizer is actually optimizing inner_algo._value_function instead of self._value_function:

https://github.com/rlworkgroup/garage/blob/731101898450da6c64fd6f6cdfc9910e4a7d8ea6/src/garage/torch/algos/maml.py#L178

It seems the value function never gets updated. I can confirm that there are gradients for self._value_function but no gradients for inner_algo._value_function:

image

avnishn commented 3 years ago

to use a working version of MAML, check out #2287