Closed naveen7v closed 6 years ago
Hi,
In Cartpole ddqn the following Q(s,a) formula has target_val, is it one step reward or is it expected future rewards?
target[i][action[i]] = reward[i] + self.discount_factor * ( target_val[i][a])
Ok , i see its like TD(0)... the target_val is like the expected future rewards, after one step
Hi,
In Cartpole ddqn the following Q(s,a) formula has target_val, is it one step reward or is it expected future rewards?
target[i][action[i]] = reward[i] + self.discount_factor * ( target_val[i][a])