hlynurd / open-mtg

A python implementation of Magic: The Gathering with an AI that plays it using Monte Carlo move evaluation.
MIT License
140 stars 32 forks source link

Overloading MCTS nodes with garbage data #8

Closed Meerkov closed 6 years ago

Meerkov commented 6 years ago

https://github.com/hlynurd/open-mtg/blob/8ad261f255859e90a93c446551911278eeec3cae/mcts.py#L87

        # Select
        while node.untried_moves == [] and node.child_nodes != []:  # node is fully expanded and non-terminal
            node = node.uct_select_child()
state.make_move(node.move)

This MCTS is broken. Please try to understand, I will explain a scenario:

Node is a node in the tree which points to states. The states are only copied the first time they are visited. https://github.com/hlynurd/open-mtg/blob/8ad261f255859e90a93c446551911278eeec3cae/mcts.py#L34

So let's say your tree tells you to: 0) Do some things and end your turn, no randomness involved. But your opponent's deck is shuffled. 1) Opponent Draws a card (it turns out it's lava axe) 2) Opponent Plays Mountain 3) Opponent Plays Lava Axe. 4) Opponent goes to combat phase.

Great, no problems yet. But the second (or more) time through:

0) Do some things and end your turn, no randomness involved. 1) Opponent Draws a card (it turns out it's a Mountain instead!) 2) Opponent Plays Mountain 3) Opponent does (?undefined behavior?) 4) Opponent's tries to do the first part of combat, but is it even legal any more? The AI is desynchronized.

The second mountain is not playable. Thus the size of the playable_indicies is smaller. Now in step (3) opponent plays an ability instead, it looks like, assuming it has an ability to play. https://github.com/hlynurd/open-mtg/blob/master/game.py#L106

Basically, you can't overlap MCTS nodes the way you are doing. The move needs to uniquely define the transition to the next state, if it is overloaded, the results will be nonsense.

Meerkov commented 6 years ago

TLDR (I did a bad job explaining) The MCTS node states do not represent the actual game state at that point in the rollout. So the exploration/exploitation will not find good results. It will guide the rollout more or less randomly instead.

hlynurd commented 6 years ago

Hi Meerkov,

Thank you for your continued interest in the code. As explained in the article, I'm not building a full MCTS but implementing the simplified Multi-Armed Bandit version from this paper: http://img.4plebs.org/boards/tg/image/1396/72/1396724222144.pdf.

A proper MCTS, as you are suggesting should be done, would most definitely make a stronger player!

hlynurd commented 6 years ago

I'll close this issue, since the underlying issue is that more search algorithms need to be implemented: https://github.com/hlynurd/open-mtg/issues/2

Meerkov commented 6 years ago

Hi, my mistake. I was misled by the file being "mcts.py" but actually you are only attempting to implement Monte Carlo Search.

MCTS refers specifically to Monte Carlo Tree Search, which this is not an implementation of. I will just suggest you update the name to more accurately represent the goal.