MCTS selection policy - Githubissues

peter1591 / hearthstone-ai

A Hearthstone AI based on Monte Carlo tree search and neural nets written in modern C++.

302 stars 49 forks source link

Closed peter1591 closed 7 years ago

peter1591 commented 7 years ago

peter1591 commented 7 years ago

Maybe PUCT is used in AlphaGo since

The available actions are TOO many choices (branch factor ~250)
PUCT needs episode context, which in AlphaGo's implementation, is the probability distribution generated by the policy network

In Hearthstone, the possible actions are not so many, and we do not have a policy network handy.