Closed apollo-time closed 6 years ago
Hi @apollo-time
It is written in DeepMind's paper
Expand and evaluate (Fig. 2b). The leaf node sL is added to a queue for neural net-work evaluation, (di(p), v) = fθ(di(sL)), where di is a dihedral reflection or rotation selected uniformly at random from i in [1..8].
and their new paper
The rules of Go are invariant to rotation and reflection. This fact was exploited in AlphaGo and AlphaGo Zero in two ways. First, training data was augmented by generating 8 symmetries for each position. Second, during MCTS, board positions were transformed using a randomly selected rotation or reflection before being evaluated by the neural network, so that the MonteCarlo evaluation is averaged over different biases
I see you did random flip and rotation in the Player's expand_and_evaluate function. I think this is need when add data for training, but don't necessary when evaluate for selecting action. How about it?