mrahtz / tensorflow-rl-pong

Pong AI trained using policy gradient-based reinforcement learning
51 stars 21 forks source link

TensorFlow reinforcement learning Pong agent

A Pong AI trained using policy gradients, implemented using TensorFlow and OpenAI gym, based on Andrej Karpathy's Deep Reinforcement Learning: Pong from Pixels.

After 7,000 episodes of training, the result looks like:

Usage

First, install OpenAI Gym and TensorFlow.

Run without any arguments to train the AI from scratch. Checkpoints will be saved every so often (see --checkpoint_every_n_episodes). Run with --load_checkpoint --render to see how an AI trained on ~8,000 episode plays.

installing Gym

OpenAI Gym provides an easy-to-use suite of reinforcement learning tasks. To install Gym, you will need a Python environment setup. It's recommended to use Python 3.5 or later. Follow the steps below to install Gym:

Installing TensorFlow

TensorFlow is an open-source machine learning framework developed by Google. Here's how to install TensorFlow on your machine:

pip install tensorflow

Understanding The Code

pong.py

policy_network.py

Vocabulary

Training Time

Changes from Andrej's Code

Lessons Learned

When you have a hypothesis that you want to test, think deliberately about what the cheapest way to test it is.

For example, for a while things weren't working, and while debugging I noticed that Andrej's code initialises his RMSProp gradient history with zeros, while TensorFlow initialises with ones. I hypothesised that this was a key factor, and the test I came up with was to compile a custom version of TensorFlow with RMSProp initialised using zeros. It later occurred to me that a much cheaper test would have been to just change Andrej's code to initialise with ones instead.

Acknowledging explicitly to yourself when you've got a hypothesis you want to test rather than just randomly testing stuff out in a state of flow may help with this.