Expected future rewards

rlcode / reinforcement-learning

Minimal and Clean Reinforcement Learning Examples

MIT License

3.35k stars 725 forks source link

Closed naveen7v closed 6 years ago

naveen7v commented 6 years ago

Hi,

In Cartpole ddqn the following Q(s,a) formula has target_val, is it one step reward or is it expected future rewards?

target[i][action[i]] = reward[i] + self.discount_factor * ( target_val[i][a])

naveen7v commented 6 years ago

Ok , i see its like TD(0)... the target_val is like the expected future rewards, after one step