number9473 / nn-algorithm

algorithm for neural network

258 stars 59 forks source link

Playing Atari with Deep Reinforcement Learning #250

Open joyhuang9473 opened 6 years ago

joyhuang9473 commented 6 years ago

Playing Atari with Deep Reinforcement Learning

Author: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Origin: https://arxiv.org/abs/1312.5602
Related:
- Pytorch easy-to-follow step-by-step Deep Q Learning tutorial with clean readable code. https://github.com/higgsfield/RL-Adventure
- https://github.com/pecu/PyTorch_CSX/blob/master/08_DQN/DQN_cartPole.ipynb
- https://becominghuman.ai/lets-build-an-atari-ai-part-1-dqn-df57e8ff3b26
- https://github.com/erikjandevries/dl-breakout-docker
- https://joshgreaves.com/reinforcement-learning/
- https://www.cs.cmu.edu/~katef/DeepRLControlCourse/lectures/lecture2_mdps.pdf
- https://blog.csdn.net/songrotek/article/details/50580904

DQN

joyhuang9473 commented 6 years ago

强化学习—DQN算法原理详解 https://wanjun0511.github.io/2017/11/05/DQN/

Off-policy是Q-Learning的特点，DQN中也延用了这一特点。而不同的是，Q-Learning中用来计算target和预测值的Q是同一个Q，也就是说使用了相同的神经网络。这样带来的一个问题就是，每次更新神经网络的时候，target也都会更新，这样会容易导致参数不收敛。回忆在有监督学习中，标签label都是固定的，不会随着参数的更新而改变。

因此DQN在原来的Q网络的基础上又引入了一个target Q网络，即用来计算target的网络。它和Q网络结构一样，初始的权重也一样，只是Q网络每次迭代都会更新，而target Q网络是每隔一段时间才会更新。