mokemokechicken / reversi-alpha-zero

Reversi reinforcement learning by AlphaGo Zero methods.
MIT License
677 stars 170 forks source link

Random flip and rotation when evaluate #16

Closed apollo-time closed 6 years ago

apollo-time commented 6 years ago

I see you did random flip and rotation in the Player's expand_and_evaluate function. I think this is need when add data for training, but don't necessary when evaluate for selecting action. How about it?

mokemokechicken commented 6 years ago

Hi @apollo-time

It is written in DeepMind's paper

Expand and evaluate (Fig. 2b). The leaf node sL is added to a queue for neural net-work evaluation, (di(p), v) = fθ(di(sL)), where di is a dihedral reflection or rotation selected uniformly at random from i in [1..8].

and their new paper

The rules of Go are invariant to rotation and reflection. This fact was exploited in AlphaGo and AlphaGo Zero in two ways. First, training data was augmented by generating 8 symmetries for each position. Second, during MCTS, board positions were transformed using a randomly selected rotation or reflection before being evaluated by the neural network, so that the MonteCarlo evaluation is averaged over different biases