This is a simplistic implementation of DQN that works under CartPole-v0 with rendered pixels as input. It extends the implementation of pytorch's official DQN tutorial (which doesn't actually work) https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html.
It's tuned specifically for CartPole-v0 and will possibly fail for other tasks.
Due to the randomness of the DQN algorithm (random training samples & random network initialization), the training is not deterministic. You may have to restart a few times to get a satisfying result.
The following 3 experiment results came from the same set of hyper parameters:
Trial | Max reward | Max 100-mean | Total episodes | Solved after (episodes) |
---|---|---|---|---|
0 | 1600 | 220 | 5000 | 3000 |
1 | 900 | 160 | 5000 | - |
2 | 2500 | 500 | 10000 | 700 |
The last column indicates when the 100-mean reached 200.
Last 300 episode history of training:
Parameter | Value |
---|---|
Learning rate | 3e-5 |
Target net update (steps) | 200 |
Batch size | 256 |
Gamma | 1 |
Memory size | 10000 |
Memory alpha | 0.6 |
Memory beta start | 0.4 |
Memory beta frames | 10000 |
eps start | 1.0 |
eps end | 0.01 |
eps decay | 10 |