keithgw / wingspan

A Wingspan AI that uses reinforcement learning to learn to play the wingspan board game.
2 stars 1 forks source link

Choose a strategy for dealing with the large state space #50

Closed keithgw closed 10 months ago

keithgw commented 10 months ago

The partial information nature of the game, means that the state transition model will be stochastic. The sources of uncertainty come from not knowing which cards will be drawn from the deck, and which cards the opponent(s) has/have in their hand(s).

One immediate simplification, could be removing the uncertainty from player hands by playing the game as open-handed. This could be modified later.

As for the deck, some possible strategies include;

  1. Tree Pruning: You can prune the tree to limit its size. This could involve setting a maximum depth for the tree, or removing branches that are unlikely to be selected based on their current estimated value. This becomes easier with a value network, because strategies like alpha-beta pruning become possible.
  2. Memory Bounded MCTS: You can implement a version of MCTS that is designed to work within a specified memory limit. When the memory limit is reached, the algorithm could remove the least promising nodes to make room for new ones.
  3. Monte Carlo Tree Search with UCT (Upper Confidence Bound applied to Trees): UCT uses a balance of exploitation and exploration to select promising nodes, which can effectively reduce the size of the tree. This is already being done, but still each unexplored state will need to be played out at least once.
  4. Transposition Tables: In some games, different sequences of actions can lead to the same game state. By storing these states in a transposition table, you can avoid duplicating parts of the game tree. This would require some hashing of the game state to store unique keys for each state. A FEN-like approach might be possible, but could be a lot of work. Alternatively, could make use of JSON hashing, and represent the game state as a JSON object. Will potentially need to handle hash collisions.
  5. Parallelization: If the size of the tree is causing computational issues, you can use parallel computing techniques to distribute the computation across multiple cores or machines.
  6. Simplification: If possible, you can simplify the game state or action space to reduce the size of the tree. This could involve abstracting similar states or actions, or ignoring certain aspects of the game state. This combined with hashing game states and using transposition tables seems promising, especially for the simple versions of the game. For example, instead of treating all 180 cards as unique, they can be reduced to their VP and food cost. Additionally, the order of played birds on the game board has no impact without brown power activations. Finally, this approach will already be required for training a value network. It would be beneficial to represent the game state to the value network in the same way as representing the game state for the purpose of building a game tree.
keithgw commented 10 months ago

This will be a combination of 3, 4, and 6. The simplification is outlined in #51

keithgw commented 10 months ago

Because the number of turns decreases as the game continues, transposition tables won't be useful in simplifyin the game tree. For example, the same hand will be worth more or less depending on the turns left remaining to get food. The good news is that the game tree will always be finite. The bad news is that the size of the child state space from any node can be as large as |deck|*|players|. It is possible to calculate the probability of drawing type (vp, cost) exactly with known information about the deck. This might be useful for simplifiying the number of times each state should be visited.

keithgw commented 10 months ago

The inital solution will use:

  1. MCTS w/ probability weighted UCT. Probability of next state given an action can be well estimated from known information of the deck. These probabilities can be improved in the future by tracking a history of known missing cards (#57, #58).
  2. Transposition tables will be used when sampling from the state space following an opponent's turn. Since the game turn is always increasing, they will be of limited use.
  3. Simplification of the game state. For now, that has been achieved according to #51. Improvements are likely to be found.