wojzaremba / trpo

99 stars 52 forks source link

Normalize advantage function #6

Open rarilurelo opened 7 years ago

rarilurelo commented 7 years ago

Hi, thanks for your implementation of TRPO.

In https://github.com/wojzaremba/trpo/blob/master/main.py#L128-L132 you normalize an advantage function. I couldn't find any description about this operation in the paper( https://arxiv.org/abs/1502.05477 ). Why did you do that?

wojzaremba commented 7 years ago

I have found it in John Schulman's code. This normalization is biased, but it's sensible.

rarilurelo commented 7 years ago

Thanks! I found it here(https://github.com/joschu/modular_rl/blob/master/modular_rl/core.py#L59-L62).