Open makaveli10 opened 1 year ago
Consider using a NN for solving Atari, feeding only a single frame to the NN would not solve the problem because of the temporal limitation. For e.g., in pong, the agent would not be able to infer the direction of the ball.
So, to overcome this limitation we stack multiple frames together to give temporal information to the agent.
Replay Memory
Double DQN When we compute the Q target, we use two networks to decouple the action selection from the target Q value generation.
Training
RMSprop
instead of vanilla SGD helps.1.0
to 0.1
or 0.05
over the first million steps.
10^60
states or Atari games with10^308
states or even more, we use a neural network to approximate the value functions.