Closed DmitriMedved closed 2 years ago
The reason here is that I've never taken the time to tune AlphaZero.jl for Mancala. All I did was basically implement the game and copy the hyperparameters from connect 4. I believe it is pretty safe to say that Mancala is a significantly harder game and so I woudn't be surprised if different hyper-parameters and/or a longer training time was needed.
Also, to be clear, 20% win rate is for the network alone and so this score is not that bad against a search-based strategy that explores thousands of scenarios. 20% means the network is competitive and this number would be 0% in the initial stages of training.
The full AlphaZero agent with MCTS and the network is of course much stronger.
That being said, the current Mancala agent can certainly be tuned and improved with a little work and I would gladly welcome a PR if you are interested.
Thank you for your reply. How can I run the full AlphaZero agent with MCTS and the network against a pretty strong Mancala engine that I have (minimax with alpha beta pruning and hand-written heuristics)?
Hello, It says here that "Currently, the network alone has a ~20% win rate against a vanilla MCTS agent (the full AlphaZero agent has a win rate of about 90%)"
20 percent is a very low rate. May I know the reason of it? Is it because the network requires much more training games or do we need to change the architecture of the neural network?
Thank you for the reply.