Closed itschenxi closed 1 year ago
It seems your model is a mixture of supervised learning and reinforcement learning, not pure RL as described in the alphazero paper.
Quoting the README: “A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al)”
It seems your model is a mixture of supervised learning and reinforcement learning, not pure RL as described in the alphazero paper.