Open rarilurelo opened 7 years ago
Hi, thanks for your implementation of TRPO.
In https://github.com/wojzaremba/trpo/blob/master/main.py#L128-L132 you normalize an advantage function. I couldn't find any description about this operation in the paper( https://arxiv.org/abs/1502.05477 ). Why did you do that?
I have found it in John Schulman's code. This normalization is biased, but it's sensible.
Thanks! I found it here(https://github.com/joschu/modular_rl/blob/master/modular_rl/core.py#L59-L62).
Hi, thanks for your implementation of TRPO.
In https://github.com/wojzaremba/trpo/blob/master/main.py#L128-L132 you normalize an advantage function. I couldn't find any description about this operation in the paper( https://arxiv.org/abs/1502.05477 ). Why did you do that?