peter1591 / hearthstone-ai

A Hearthstone AI based on Monte Carlo tree search and neural nets written in modern C++.
302 stars 49 forks source link

MCTS selection policy #61

Closed peter1591 closed 7 years ago

peter1591 commented 7 years ago

Try this one? PUCT used in AlphaGO http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.172.9450&rep=rep1&type=pdf

peter1591 commented 7 years ago

Maybe PUCT is used in AlphaGo since

  1. The available actions are TOO many choices (branch factor ~250)
  2. PUCT needs episode context, which in AlphaGo's implementation, is the probability distribution generated by the policy network

In Hearthstone, the possible actions are not so many, and we do not have a policy network handy.