Open shengkelong opened 1 year ago
The noise is not set in expand_node()
. It is tentative policy. It will be replaced by NN policy in process_mini_batch()
. So you are right. The MCTS process is not random.
CGLemon is right. There is little randomness when executing as a normal MCTS player. If you want to add randomness to TamaGo, I modify TamaGo to be able to run like AlphaZero (dirichlet noise and move generation from distribution of the number of visits).
I observed that the policy will be set to noise in "expand_node", but the "update_policy" used during inference (in "process_mini_batch") will directly update the policy to the result of network calculations, so that there will be no randomness at all except selfplay games.