Open Vovak1919 opened 3 years ago
Yep you are correct! Check #137 #74. This repo was actually based on AlphaGo Zero.
Yep you are correct! Check #137 #74. This repo was actually based on AlphaGo Zero.
Thanks! What do you say about this difference between AGZ and AZ?
AlphaGo Zero tuned the hyper-parameter of its search by Bayesian optimisation. In AlphaZero we reuse the same hyper-parameters for all games without game-specific tuning. The sole exception is the noise that is added to the prior policy to ensure exploration (29); this is scaled in proportion to the typical number of legal moves for that game type.
Another question. What do you say about the file pseudocode.py from Supplementary Materials if this topic is still relevant to you?
I started researching the alpha-zero-general algorithm, but I found this parameter in the main.py module
And this is the coach.py module
Are there any discrepancies with the original description of "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm"?
Did I understand correctly that this is the AlphaGo Zero algorithm, but not AlphaZero?