Closed BigBadBurrow closed 4 years ago
Remember, the array is reversed so I think you'd need to set the discounted_reward = 0 first:
if is_terminal:
discounted_reward = 0
discounted_reward = reward + (self.gamma * discounted_reward)
rewards.insert(0, discounted_reward)
Thanks!
In the
update()
method thediscounted_reward
is always calculated using a gamma of the previous discounted_reward, but there's no break between episodes so the reward from one episode is carried across to the next, which I assume cannot be correct.Suggest adding
terminal_states
list to the Memory class, and then setting the discounted_reward = 0 when a new episode starts.