maxpumperla / deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"
https://www.manning.com/books/deep-learning-and-the-game-of-go
953 stars 387 forks source link

Chapter 13 error on AlphaGONode when selecting a child #77

Open meiner75 opened 3 years ago

meiner75 commented 3 years ago

Hello, first of all, thanks for your great work! I'm following your book and got stuck in chapter 13 implementing the AlphaGo tree search mechanism.

More specifically when I run the algorithm for selecting a move I get the following error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This is due to the function select child where

return max(self.children.items(), key=lambda child: child[1].q_value + \ child[1].u_value)

is trying to calculate max from a list of arrays.

This also made me questioning about what we are trying to calculate here. We want to select the action (move) which maximizes q(s,a) + u(s,a), but for each possible action we have an entire array given by q(s,a) + u(s,a), so what exactly we want to calculate here?

Thanks