junxiaosong / AlphaZero_Gomoku

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
MIT License
3.23k stars 962 forks source link

Keras implementation is not RL #132

Open itschenxi opened 1 year ago

itschenxi commented 1 year ago

It seems the Keras implementation does not use policy gradient algorithm (reinforcement learning), instead it uses supervised learning. The Pytorch implementation uses reinforcement learning. Why?