Open boscotsang opened 5 years ago
Hi,
According to this comment, it seems just for convenience.
Modifying to self.I.buf_rews_int.T[::-1]
will not change its std significantly, I think.
Exactly. :+1:
I think they have made a mistake!!!
It must have been self.I.buf_rews_int.T[::-1]
as 4kasha has mentioned.
In ppo_agent.py, it compute the running estimate of intrinsic returns with rff_int.
rffs_int = np.array([self.I.rff_int.update(rew) for rew in self.I.buf_rews_int.T])
In reinforcement learning, returns are computed by sum{\gamma^t r_t}. However in rff_int, it seems that it compute the returns by sum{\gamma^(T-t) r_t) which discounted the reward forward. What's the reason for compute the intrinsic returns forward? Thanks!