maxpumperla / deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"
https://www.manning.com/books/deep-learning-and-the-game-of-go
989 stars 390 forks source link

MTCS only two layer? #86

Open fuhaoda opened 3 years ago

fuhaoda commented 3 years ago

Hi,

Excellent book! I was reading the book while look at the codes. In chapter 4, we talked about MTCS. In the book, MTCS explored a few layers of the tree. But the following codes in each round, it only explores at most two layer (due to the fact to reset node = root )? Am I missing something or this is an intention in the codes? Thanks for the help!

# tag::mcts-rounds[]
        for i in range(self.num_rounds):
            node = root
            while (not node.can_add_child()) and (not node.is_terminal()):
                node = self.select_child(node)

            # Add a new child node into the tree.
            if node.can_add_child():
                node = node.add_random_child()

            # Simulate a random game from this node.
            winner = self.simulate_random_game(node.game_state)

            # Propagate scores back up the tree.
            while node is not None:
                node.record_win(winner)
                node = node.parent
# end::mcts-rounds[]
maxpumperla commented 3 years ago

Hi @fuhaoda, thank you! I'm not entirely sure what you mean by "two layers" here, but let me try to explain.

What you're trying to do is select the next move from your current position. In the code above root is set to this position. Then, for num_rounds you go down the tree by traversing existent children (note the while in the select_child part). If you don't have any children in the existing tree, you expand once. From there you simulate the game until the end and propagate the results back up. So, you can see that this algorithm explores a lot of paths through the tree. The more children you already have, the deeper you go during search (i.e. not just two layers).

fuhaoda commented 3 years ago

Max - Thank you for your reply. My question is about two level of nodes. The first one is from root (current position) to select_child (this is a node from root), then we can at most add one more layer of node (i.e. add_random_child()). Then from there, all the things will depend on simulate_random_game. Is my understanding correct? Are there any benefit to consider to add more or less layers of nodes?

Thank you for helping me to understanding this. I like the way that you organized your book to start from simple illustrations. Please keep writing more excellent books!

macfergus commented 3 years ago

Hi fuhaoda, just to re-iterate what Max said, since node = self.select_child(node) is called in a loop, you may call select_child an arbitrary number of times. So the final value of node could be any number of steps down the tree from the root.

After that, each round will add exactly one more layer of node (that's the add_random_child as you called out), but that new node may be added to any branch. So the tree tends to grow unevenly, like this:

  o
 /  \
o    o
    / \
   o   o
        \
         o
          \
           o
            \
             o

The point of MCTS is that the branch representing the best sequence of moves for both players will (eventually) have the most layers. The farther down you read out a particular line of game play, the better your assessment of that specific line of play will be. The tradeoff is that it has not evaluated alternatives as thoroughly, so it may miss good moves in other branches.

Hope that helps!

fuhaoda commented 3 years ago

This is a great point. Now I see it. Thank you!!

arisliang commented 3 years ago

I'm also confused by this line while (not node.can_add_child()) and (not node.is_terminal()):.

At the beginning of the round, node is root, node.can_add_child() will be true, and not node.can_add_child() will be false, so it will not enter the while loop. So the next node.add_random_child() will be run once only, therefore only 1 level below the root.

After that it'll be next round, so each round it only checks 1 level below the root.

What did I miss?

PS, unless the num_rounds are so many, that it exhausted all the legal moves of the current state (at most board size * board size, and increasingly less?), and still carry on rounds, in which case, it will enter the while loop, and start new levels down.

macfergus commented 3 years ago

Hi @arisliang, I don't think you've missed anything. As you noted, when it's exhausted the legal moves of the current state, it starts to go deeper. However, after the first layer, the tree starts to expand unevenly. So the subtree under move A could go much deeper than the subtree under move B, if move A tends to get a higher estimated winrate. You do need quite a few rounds to make this work -- the pure python implementation in the book is probably too slow to be useful on a 19x19 board.

btw, chapter 14 describes an improvement on the search algorithm, where unvisited children are assigned a "default" winrate. Then you can include unvisited children in the UCT calculation, and only hit them up when the UCT score indicates they're sufficiently interesting. But to make this work well, it needs a neural network to prioritize the moves to explore first

arisliang commented 3 years ago

Regarding the uneven tree expansion part, how likely select_child returns different child? e.g. if it returns child child-1 at some round, will it tend to still return child-1 or more likely to return a different child child-2 in the next round?

Because select_child assign the start node of the subtree to expand, if it returns the same child child-1 everytime, then random child will be added only below child-1 subtree.

In case, if select_child returns the same child everytime, the algorithm will only explore all children under root, plus all children under child-1

arisliang commented 3 years ago

Oh on another look, since select_child is in a while loop, even if it returns same child under the root, once exhausted the child's legal moves, it will select a grand child and start a new subtree two levels down. Overtime, it'll explore more levels down regardless whether it returns same child each time, as long as there're still rounds left to exhaust the legal moves for the selected subtrees.

arisliang commented 3 years ago

I still don't quite get the uneven part though. In order for the tree expansion to be uneven, node = self.select_child(node) needs to be called again and again in each round, before adding random children.

But it's not possible for select_child to be called again and again in each round, because each newly selected node have not visited any child yet, so they should always have random children to add, therefore would most likely exit the while's loop immediately.

So for each round, the while loop always tend to run once only and move down to add_random_child. This way the tree tends to be even, since it needs to exhaust all legal moves at each level, before it can expand the tree deeper.

arisliang commented 3 years ago

I think I get it now, even the node can't go deeper without visiting all its children, it'll tend to generate uneven trees.