jonathan-laurent / AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
https://jonathan-laurent.github.io/AlphaZero.jl/stable/
MIT License
1.23k stars 137 forks source link

Training Stuck at 'Network Only against MinMax (depth 6)' after Modifying TicTacToe Board Size to 4x4 #180

Closed solidcub closed 1 year ago

solidcub commented 1 year ago

Hello, thank you so much for creating such an amazing project. I recently started using GitHub because of this project. I am currently studying the Connect-4 and TicTacToe code to train my game using AlphaZero.

As a first step, I modified the const BOARD_SIDE = 3 in TicTacToe's game.jl to const BOARD_SIDE = 4 and created a new game name for learning.

julia --project -e 'using AlphaZero; Scripts.explore("new-game")'

The game tests on a 4x4 board run fine.

julia --project -e 'using AlphaZero; Scripts.train("new-game")'

However, when I try to train using the above command:

Running benchmark: AlphaZero against MCTS (400 rollouts)

It works fine up to 100%.

Running benchmark: Network Only against MinMax (depth 6)

After printing the above message, the process stops.

I am unsure about the cause of this issue and would like to ask for guidance on how to proceed. I apologize for the basic question and appreciate the well-organized code that is great for learning. I am looking forward to your response.

new_game

jonathan-laurent commented 1 year ago

Thanks for reaching out! I suspect you are using too much depth for your minmax baseline. For standard tictactoe, the branching factor is really small so exhaustive exloration at depth 6 can work. However, if you enlarge the grid to $4 \times 4$, depth 6 means visiting $16 \times 15 \times 14 \times 13 \times 12 \times 11 \approx 6 \times 10^6$ nodes from the root, which is probably too expensive.

solidcub commented 1 year ago

I adjusted the MinMaxTS depth from 6 to 4 in params.jl, and now everything works as expected.

I have two more questions:

  1. If I purchase an RTX 4090, can I fully utilize its performance for training with AlphaZero.jl? Also, I assume it's not possible to train using a MacBook Pro with an M2 processor, correct?
  2. Would it be possible to use the parameters learned with AlphaZero.jl in an indie game I plan to develop in the future?

I'm still learning, but I hope to contribute to this project in the future as my knowledge grows.

Many thanks!

ttt4

jonathan-laurent commented 1 year ago

If I purchase an RTX 4090, can I fully utilize its performance for training with AlphaZero.jl? Also, I assume it's not possible to train using a MacBook Pro with an M2 processor, correct?

An RTX 4090 should work out of the box indeed. GPU utilization is a function of your hyperparameters. Some choices of hyperparameters will keep the GPU (almost) fully utilized while others won't. Also, one of my plans for the next release is a full-GPU option where everything runs on GPU, including tree search and environment simulation. An RTX 4090 would fully leverage this. Regarding MacOS support, it is not going to work out-of-the-box. It might (or might not) be possible to add a Metal.jl backend but this is not a priority for me.

Would it be possible to use the parameters learned with AlphaZero.jl in an indie game I plan to develop in the future?

AlphaZero.jl is MIT licensed so you can do pretty much everything, including commercial applications. Just make sure to cite AlphaZero.jl in your project. :-)

solidcub commented 1 year ago

Much appreciated!