jonathan-laurent / AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
https://jonathan-laurent.github.io/AlphaZero.jl/stable/
MIT License
1.24k stars 140 forks source link

The strength of the Mancala bot #110

Closed DmitriMedved closed 2 years ago

DmitriMedved commented 2 years ago

Hello, It says here that "Currently, the network alone has a ~20% win rate against a vanilla MCTS agent (the full AlphaZero agent has a win rate of about 90%)"

20 percent is a very low rate. May I know the reason of it? Is it because the network requires much more training games or do we need to change the architecture of the neural network?

Thank you for the reply.

jonathan-laurent commented 2 years ago

The reason here is that I've never taken the time to tune AlphaZero.jl for Mancala. All I did was basically implement the game and copy the hyperparameters from connect 4. I believe it is pretty safe to say that Mancala is a significantly harder game and so I woudn't be surprised if different hyper-parameters and/or a longer training time was needed.

Also, to be clear, 20% win rate is for the network alone and so this score is not that bad against a search-based strategy that explores thousands of scenarios. 20% means the network is competitive and this number would be 0% in the initial stages of training.

The full AlphaZero agent with MCTS and the network is of course much stronger.

That being said, the current Mancala agent can certainly be tuned and improved with a little work and I would gladly welcome a PR if you are interested.

DmitriMedved commented 2 years ago

Thank you for your reply. How can I run the full AlphaZero agent with MCTS and the network against a pretty strong Mancala engine that I have (minimax with alpha beta pruning and hand-written heuristics)?