suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
MIT License
3.74k stars 1.01k forks source link

Not pure RL #286

Closed itschenxi closed 1 year ago

itschenxi commented 1 year ago

It seems your model is a mixture of supervised learning and reinforcement learning, not pure RL as described in the alphazero paper.

suragnair commented 1 year ago

Quoting the README: “A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al)”