fix the value of target[0][action] line 65

keon / deep-q-learning

Minimal Deep Q Learning (DQN & DDQN) implementations in Keras

https://keon.io/deep-q-learning

MIT License

1.29k stars 455 forks source link

Closed tonegas closed 6 years ago

tonegas commented 7 years ago

I have modified how is obtained the value of target[0][action] =... because in the paper https://arxiv.org/abs/1312.5602, this value is obtained as

r_j + gamma * max_a'( Qhat( state{j+1} , a' ) )

and not as in the previous code as

r_j + gamma + Qhat( state{j+1} , argmaxa'( Q( state{j+1} , a' ) )'

With this fix the algorithm is more stable.

keon commented 6 years ago

Sorry it took forever to merge this PR. Thanks!