keithgw / wingspan

A Wingspan AI that uses reinforcement learning to learn to play the wingspan board game.
2 stars 1 forks source link

Implement basic MCTS #47

Open keithgw opened 10 months ago

keithgw commented 10 months ago

MCTS will allow the agent to "playout" a game from the current state to generate a distribution over action-values. This will be used to generate a policy: state -> action.

keithgw commented 10 months ago
keithgw commented 10 months ago

Updated thoughts on tree policy, playout policy, returned policy, and opponent policy.

keithgw commented 10 months ago

With such a large state-action space, will not start with persisiting the tree between real game turns. Instead, we'll build the tree anew each turn.

keithgw commented 10 months ago

Since the transition model is stochastic, the game tree is going to get very large. It is unclear how expansion should work with a stochastic transition mocdel.

keithgw commented 10 months ago

Updated thoughts on tree policy:

keithgw commented 10 months ago

Including nodes that represent opponent players' turns would allow for a minimax like policy. This could be a potential improvement to the policy later, but for now, we will only represent nodes in the game tree when it is the acting player's turn. Then, the simple policy will be to max_a(Q(s, a)).

keithgw commented 10 months ago

Going to follow the exptimax approximation from https://arxiv.org/pdf/0909.0801.pdf to deal with stochasticity. This will require two types of nodes: decision and chance.

Decision Nodes:

Chance Nodes:

This will require the "flattening" of decision nodes, so that chance nodes are always children. For example, the decision node that starts a turn at "choose action" would have had "play a bird" as a decision node child. Instead, the "play a bird" action should get flattened to "play bird i" for i in |hand|. Similarly, "draw a bird" gets flattened to "draw tray bird i" for i in |tray| + "draw from deck"

keithgw commented 10 months ago

There is no need to enforce that decision nodes and chance nodes strictly alternate. Handling how the tree should be traversed using recursion and handling traversal through decision, chance, leaf, and terminal nodes differently will allow for any arbitrary construction of the game tree.