Open fuhaoda opened 3 years ago
Hi @fuhaoda, thank you! I'm not entirely sure what you mean by "two layers" here, but let me try to explain.
What you're trying to do is select the next move from your current position. In the code above root
is set to this position. Then, for num_rounds
you go down the tree by traversing existent children (note the while
in the select_child
part). If you don't have any children in the existing tree, you expand once. From there you simulate the game until the end and propagate the results back up. So, you can see that this algorithm explores a lot of paths through the tree. The more children you already have, the deeper you go during search (i.e. not just two layers).
Max - Thank you for your reply. My question is about two level of nodes. The first one is from root (current position) to select_child (this is a node from root), then we can at most add one more layer of node (i.e. add_random_child()). Then from there, all the things will depend on simulate_random_game. Is my understanding correct? Are there any benefit to consider to add more or less layers of nodes?
Thank you for helping me to understanding this. I like the way that you organized your book to start from simple illustrations. Please keep writing more excellent books!
Hi fuhaoda, just to re-iterate what Max said, since node = self.select_child(node)
is called in a loop, you may call select_child
an arbitrary number of times. So the final value of node
could be any number of steps down the tree from the root.
After that, each round will add exactly one more layer of node (that's the add_random_child
as you called out), but that new node may be added to any branch. So the tree tends to grow unevenly, like this:
o
/ \
o o
/ \
o o
\
o
\
o
\
o
The point of MCTS is that the branch representing the best sequence of moves for both players will (eventually) have the most layers. The farther down you read out a particular line of game play, the better your assessment of that specific line of play will be. The tradeoff is that it has not evaluated alternatives as thoroughly, so it may miss good moves in other branches.
Hope that helps!
This is a great point. Now I see it. Thank you!!
I'm also confused by this line while (not node.can_add_child()) and (not node.is_terminal()):
.
At the beginning of the round, node is root, node.can_add_child()
will be true, and not node.can_add_child()
will be false, so it will not enter the while loop. So the next node.add_random_child()
will be run once only, therefore only 1 level below the root.
After that it'll be next round, so each round it only checks 1 level below the root.
What did I miss?
PS, unless the num_rounds
are so many, that it exhausted all the legal moves of the current state (at most board size * board size
, and increasingly less?), and still carry on rounds, in which case, it will enter the while loop, and start new levels down.
Hi @arisliang, I don't think you've missed anything. As you noted, when it's exhausted the legal moves of the current state, it starts to go deeper. However, after the first layer, the tree starts to expand unevenly. So the subtree under move A could go much deeper than the subtree under move B, if move A tends to get a higher estimated winrate. You do need quite a few rounds to make this work -- the pure python implementation in the book is probably too slow to be useful on a 19x19 board.
btw, chapter 14 describes an improvement on the search algorithm, where unvisited children are assigned a "default" winrate. Then you can include unvisited children in the UCT calculation, and only hit them up when the UCT score indicates they're sufficiently interesting. But to make this work well, it needs a neural network to prioritize the moves to explore first
Regarding the uneven tree expansion part, how likely select_child
returns different child? e.g. if it returns child child-1
at some round, will it tend to still return child-1
or more likely to return a different child child-2
in the next round?
Because select_child
assign the start node of the subtree to expand, if it returns the same child child-1
everytime, then random child will be added only below child-1
subtree.
In case, if select_child
returns the same child everytime, the algorithm will only explore all children under root
, plus all children under child-1
Oh on another look, since select_child
is in a while loop, even if it returns same child under the root, once exhausted the child's legal moves, it will select a grand child and start a new subtree two levels down. Overtime, it'll explore more levels down regardless whether it returns same child each time, as long as there're still rounds left to exhaust the legal moves for the selected subtrees.
I still don't quite get the uneven part though. In order for the tree expansion to be uneven, node = self.select_child(node)
needs to be called again and again in each round, before adding random children.
But it's not possible for select_child
to be called again and again in each round, because each newly selected node
have not visited any child yet, so they should always have random children to add, therefore would most likely exit the while's loop immediately.
So for each round, the while loop always tend to run once only and move down to add_random_child
. This way the tree tends to be even, since it needs to exhaust all legal moves at each level, before it can expand the tree deeper.
I think I get it now, even the node can't go deeper without visiting all its children, it'll tend to generate uneven trees.
Hi,
Excellent book! I was reading the book while look at the codes. In chapter 4, we talked about MTCS. In the book, MTCS explored a few layers of the tree. But the following codes in each round, it only explores at most two layer (due to the fact to reset
node = root
)? Am I missing something or this is an intention in the codes? Thanks for the help!