Open smart-fr opened 1 year ago
There is no perfect answer here. Also, note that AlphaZero was not designed to work optimally with every possible cluster configuration out-of-the-box. You may need to do some tweaking to achieve the performance you want on your hardware.
One of the main factors that determines how much compute you will need is the branching factor of your game (along with the average number of moves in a game). For example, connect four has maximum branching factor 7 and a game of connect four is usually ~30 moves. Connect four is about the difficulty of what you can solve easily on commodity hardware (one gaming laptop with a decent GPU).
AlphaZero being sample inefficient, the amount of required compute can scale really fast with the complexity of your game. Depending on your hardware, you can invest this compute differently:
What the best tradeoff is depends on your available hardware, your specific use-case, how costly it is to simulate your environment...
Finally, the best way to make AlphaZero suitable for challenging games without spending too much compute is to initialize the policy with a decent heuristic (possibly learned from human data with supervised learning). This has the practical effect of considerably reducing your branching factor since only actions that are not clearly stupid will be considered most of the times.
Thank you for your reply. My game has a huge branching factor. 😨 Filtering the legal actions mask using a decent heuristic is definitely on my list.
Re: compute investment strategy, if I want to explore using larger networks vs more MCTS simulations, what are the main parameters I should play around with, which don't require a deep understanding of all under-the-hood mechanisms? I guess in params.jl
these may be the num_filters
, num_blocks
, conv_kernel_size
arguments of the NetLib.ResNetHP()
function, the num_iters_per_turn
argument of the MctsParams()
function, the num_iters
argument of the Params()
function?
Would you recommend some readings about this question?
Also if your branching factor is huge you will be penalized because the way it is coded Alphazero.jl use all possible moves. For example using Alphazero.jl for chess would need to store more than 1800 moves, policy etc. Whereas in a given position you have at most around 250 moves possible. So you are wasting a lot of memory, preventing I think to train such games without huge amount of ram. (I tried for the game Ataxx, this is very slow, cause you can't play that many games in parallel). It is quit easy to fix (eg storing the move or a move id in actions and retaining only valid actions instead of masking)
@fabricerosay You are perfectly right. The reason I made this implementation choice initially is that any problem with a branching factor where this is problematic is probably not learnable from scratch using a reasonable amount of compute. This does not hold anymore when initializing the policy from supervised learning though and so I may want to lift this restriction indeed.
I started again to work on alphazero: new implementation more inline with Alphagpu but not wholly on gpu( i dropped struct nodes etc for a SOA imple), adding NNcache and on connect4 i saw a huge performance gain: 4096 games, 600 rollouts with a 128x5 resnet in under 5 minutes.
I started again to work on alphazero: new implementation more inline with Alphagpu but not wholly on gpu( i dropped struct nodes etc for a SOA imple), adding NNcache and on connect4 i saw a huge performance gain: 4096 games, 600 rollouts with a 128x5 resnet in under 5 minutes.
@fabricerosay You are perfectly right. The reason I made this implementation choice initially is that any problem with a branching factor where this is problematic is probably not learnable from scratch using a reasonable amount of compute. This does not hold anymore when initializing the policy from supervised learning though and so I may want to lift this restriction indeed.
Using an heuristic to artificially prevent (mask) the dumbest actions after GI.play!()
I could train an agent over an acceptable delay on my PC.
I also tried to run the training on a multi-GPU VM. With almost no gain, since AlphaZero.jl seems to use only one GPU. Is it by design? Or should I tweak some settings?
I started again to work on alphazero: new implementation more inline with Alphagpu but not wholly on gpu( i dropped struct nodes etc for a SOA imple), adding NNcache and on connect4 i saw a huge performance gain: 4096 games, 600 rollouts with a 128x5 resnet in under 5 minutes.
Interesting. Does your system offer an API similar to AlphaZero.jl's GameInterface
?
Interesting. Does your system offer an API similar to AlphaZero.jl's
GameInterface
?
No it is different, very experimental, and miles away from AlphaZero.jl in terms of coding quality, it is not as generic, but it is probably faster. If you'd be to use it , you would have to dig into the ugly, uncommented code. Very amateurish work, which I am.
Does AlphaZero.jl take advantage of multiple GPUs on a single machine, or is a cluster of single-GPU machines the only way to parallelize GPU computing? If both ways are possible:
AlphaZero.jl cannot leverage multi-GPU machines out-of-the-box but making it do so would probably only require a small change.
It would be super helpful for the exploration of the framework and its possibilities to have some list of rules and constraints linking the parameters, the output indicators and the system hardware characteristics. For example (I don't know if these are true or false):
etc.
If everyone could contribute their observations, it would be a useful start.
You are perfectly right and this would be useful. Also, it would be great it someone can contribute such a section to the documentation.
More generally, I am regularly thinking about what a smart framework could look like that performs as much autotuning as possible given one's configuration, makes hyperparameter sanity checks and even suggests relevant hyperparameter variants. This is an open research question though and in any case, I am skeptical an algorithm as complex and computationally-demanding as AlphaZero can ever be used as a black-box.
I understand the complexity of the question of a self-adapting framework, and the value of any solution which would get us closer to this goal. From my amateur point of view, this is simply way beyond my power.
But believe it or not, I was able to create a fairly good agent playing my game, in nominal conditions (16x16 board), without any deep knowledge of under-the-hood mechanics, "just" by coding my game's rules according to the GameInterface
and with a little bit of parameters tweaking -in particular, skipping benchmark play altogether, reducing the batch sizes and the memory buffer size.
I wouldn't call this "black-box" usage, but it demonstrates the great versatility of this framework you created following DeepMind guidelines.
Hihi,
Do you have a rule of thumb that could be used in order to determine which sizing of hardware would be required in order to train an agent given the size and complexity of a game model? In terms of CUDA cores, GPU memory, CPU memory, Mflops whatever unit could help configure the hardware before starting
dummy_run
-ning a game?