Deep Q-Learning - Githubissues

makaveli10 commented 1 year ago

With small enough state and action space, we can use arrays and tables representations to approximate the value functions.
But with large exponential state spaces for e.g. tetris with 10^60 states or Atari games with 10^308 states or even more, we use a neural network to approximate the value functions.
Since, the observation space is huge so, using creating a Q-table would not be efficient in this case.

makaveli10 commented 1 year ago

Consider using a NN for solving Atari, feeding only a single frame to the NN would not solve the problem because of the temporal limitation. For e.g., in pong, the agent would not be able to infer the direction of the ball.

So, to overcome this limitation we stack multiple frames together to give temporal information to the agent.

makaveli10 commented 1 year ago

Replay Memory

Efficient use of experiences Agent interacts with the environment and gathers experiences, learns from them and discards them which is not efficient. But, with experience replay, we create a replay buffer that saves experience samples that we can reuse during the training.
Forgetting previous experiences When we give sequential samples of experiences to our agent it tends to forget the previous experiences as it overwrites new experiences. So, having a replay memory of experiences helps with this problem as well.

makaveli10 commented 1 year ago

Double DQN When we compute the Q target, we use two networks to decouple the action selection from the target Q value generation.

Use our DQN net to select the best action for the next state.
Use our target network to calculate the target Q value of taking that action at the next state.

makaveli10 commented 1 year ago

Training

Using huber loss instead of squared, so that outliers do not effect the weight update too much.
As optimization in RL really matters, using RMSprop instead of vanilla SGD helps.
Annealing of the exploration rate from 1.0 to 0.1 or 0.05 over the first million steps.

makaveli10 / rl

Deep Q-Learning #4