A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al). It is designed to be easy to adopt for any two-player turn-based adversarial game and any deep learning framework of your choice. A sample implementation has been provided for the game of Othello in PyTorch and Keras. An accompanying tutorial can be found here. We also have implementations for many other games like GoBang and TicTacToe.
To use a game of your choice, subclass the classes in Game.py
and NeuralNet.py
and implement their functions. Example implementations for Othello can be found in othello/OthelloGame.py
and othello/{pytorch,keras}/NNet.py
.
Coach.py
contains the core training loop and MCTS.py
performs the Monte Carlo Tree Search. The parameters for the self-play can be specified in main.py
. Additional neural network parameters are in othello/{pytorch,keras}/NNet.py
(cuda flag, batch size, epochs, learning rate etc.).
To start training a model for Othello:
python main.py
Choose your framework and game in main.py
.
For easy environment setup, we can use nvidia-docker. Once you have nvidia-docker set up, we can then simply run:
./setup_env.sh
to set up a (default: pyTorch) Jupyter docker container. We can now open a new terminal and enter:
docker exec -ti pytorch_notebook python main.py
We trained a PyTorch model for 6x6 Othello (~80 iterations, 100 episodes per iteration and 25 MCTS simulations per turn). This took about 3 days on an NVIDIA Tesla K80. The pretrained model (PyTorch) can be found in pretrained_models/othello/pytorch/
. You can play a game against it using pit.py
. Below is the performance of the model against a random and a greedy baseline with the number of iterations.
A concise description of our algorithm can be found here.
If you found this work useful, feel free to cite it as
@misc{thakoor2016learning,
title={Learning to play othello without human knowledge},
author={Thakoor, Shantanu and Nair, Surag and Jhunjhunwala, Megha},
year={2016},
publisher={Stanford University, Final Project Report}
}
While the current code is fairly functional, we could benefit from the following contributions:
Game.py
, along with their neural networksSome extensions have been implented here.
Note: Chainer and TensorFlow v1 versions have been removed but can be found prior to commit 2ad461c.