Open apollo-time opened 6 years ago
Hi @apollo-time
- Why don't update Q with N/W at this time?
- Isn't it W=W+virtual loss when player is white?
Thank you very good point! That is a serious bug for virtual loss (Virtual Loss of W didn't work).
- Why didn't share tree between two players?
Because if models of black and white are different, MCTS results are also different.
I see two players use the same model in self play mode.
Yes, that's right. Although it is a little difficult to implement, sharing tree search results may be useful to save computation costs.
Just for your reference, I am sharing tree search between 2 players, see codes here: https://github.com/gooooloo/alpha-zero-in-python/blob/master/src/reversi_zero/agent/player.py
But I don't think this makes big difference. Many other settings are much more important, such as simulation number, resignation threshold, performance trade-off between self/opt/eval module, etc.
I see DeepMind backup reward to parent nodes without modify. Why don't use discount-rate γ?
Why don't use discount-rate γ?
It is a diffucult question.
Conversely, the reasons to use discount-rate are
I think
Thinking that way, the reasons not to use discount-rate are
But I think the first step is not related with the final result as final step, when the game length is long.
Although there is only one kind of the first move of reversi, it does not matter, but maybe there is a possibility that the first move becomes a bad move in go and chess.
https://github.com/mokemokechicken/reversi-alpha-zero/blob/f1cfa6c7177ec5f76a89e20fd97eb4c5d678611d/src/reversi_zero/agent/player.py#L165-L168
I see update N and W with virtual loss when select the node in order to discourages other threads from simultaneously exploring the identical variation (in paper).