joapolarbear / dl_notes

1 stars 1 forks source link

Mastering the game of Go without human knowledge (AlphaGo Zero) #11

Open joapolarbear opened 4 years ago

joapolarbear commented 4 years ago

[PDF]

Different between AlphaGo Zero and AlphaGo Lee

  1. trained solely by self-play reinforcement learning, starting from random play, without any supervision or use of human data. Second, it
  2. uses only the black and white stones from the board as input features.
  3. uses a single neural network, rather than separate policy and value networks.
  4. uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte Carlo rollouts.