makaveli10 / rl

0 stars 0 forks source link

Deep Q-Learning #4

Open makaveli10 opened 1 year ago

makaveli10 commented 1 year ago
makaveli10 commented 1 year ago

Consider using a NN for solving Atari, feeding only a single frame to the NN would not solve the problem because of the temporal limitation. For e.g., in pong, the agent would not be able to infer the direction of the ball.

So, to overcome this limitation we stack multiple frames together to give temporal information to the agent.

makaveli10 commented 1 year ago

Replay Memory

makaveli10 commented 1 year ago

Double DQN When we compute the Q target, we use two networks to decouple the action selection from the target Q value generation.

makaveli10 commented 1 year ago

Training