tqjxlm / Simple-DQN-Pytorch

A simplistic implementation of DQN that works under CartPole-v0 with rendered pixels as input
13 stars 3 forks source link

Simple-DQN-Pytorch

This is a simplistic implementation of DQN that works under CartPole-v0 with rendered pixels as input. It extends the implementation of pytorch's official DQN tutorial (which doesn't actually work) https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html.

It's tuned specifically for CartPole-v0 and will possibly fail for other tasks.

Requirements

Performance

Due to the randomness of the DQN algorithm (random training samples & random network initialization), the training is not deterministic. You may have to restart a few times to get a satisfying result.

The following 3 experiment results came from the same set of hyper parameters:

Trial Max reward Max 100-mean Total episodes Solved after (episodes)
0 1600 220 5000 3000
1 900 160 5000 -
2 2500 500 10000 700

The last column indicates when the 100-mean reached 200.

image not available

Last 300 episode history of training:

image not available

Implementation

Methods

Hyper-parameters

Parameter Value
Learning rate 3e-5
Target net update (steps) 200
Batch size 256
Gamma 1
Memory size 10000
Memory alpha 0.6
Memory beta start 0.4
Memory beta frames 10000
eps start 1.0
eps end 0.01
eps decay 10

Other Observations