This also made me questioning about what we are trying to calculate here.
We want to select the action (move) which maximizes q(s,a) + u(s,a), but for each possible action we have an entire array given by q(s,a) + u(s,a), so what exactly we want to calculate here?
Hello, first of all, thanks for your great work! I'm following your book and got stuck in chapter 13 implementing the AlphaGo tree search mechanism.
More specifically when I run the algorithm for selecting a move I get the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This is due to the function select child where
return max(self.children.items(), key=lambda child: child[1].q_value + \ child[1].u_value)
is trying to calculate max from a list of arrays.
This also made me questioning about what we are trying to calculate here. We want to select the action (move) which maximizes q(s,a) + u(s,a), but for each possible action we have an entire array given by q(s,a) + u(s,a), so what exactly we want to calculate here?
Thanks