Mastering the game of Go without human knowledge (AlphaGo Zero) - Githubissues

joapolarbear / dl_notes

1 stars 1 forks source link

Mastering the game of Go without human knowledge (AlphaGo Zero) #11

Open joapolarbear opened 4 years ago

joapolarbear commented 4 years ago

AlphaGo Fan and AlphaGo Lee use two neural networks: a policy network that outputs move probabilities and a value network that outputs a position evaluation.
Combine with Monte Carlo Tree Search (MCTS)
- [ ] Pre-train the two networks: Supervised learning, from human expert games.
Then self-play

Different between AlphaGo Zero and AlphaGo Lee

trained solely by self-play reinforcement learning, starting from random play, without any supervision or use of human data. Second, it
uses only the black and white stones from the board as input features.
uses a single neural network, rather than separate policy and value networks.
uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte Carlo rollouts.

Combine two neural networks in AlphaGo Lee to one network, output the probability of each action, and a scalar evaluation, estimating the probability of the current player winning from this action.