Refactor MCTS and Merge RAVE

ryanbhayward / games-puzzles-algorithms

software for CMPUT 355 (initial 396): ugrad course on games, puzzles, algorithms

36 stars 153 forks source link

Refactor MCTS and Merge RAVE #85

Closed dmorrill10 closed 8 years ago

dmorrill10 commented 8 years ago

@imccarten1, I've refactored the MCTS related classes a bit and I think it's quite a bit cleaner now. I've also merged your RAVE implementation from rave into these changes. Do you want to look over these changes?

One thing I changed from your RAVE implementation is that now RaveNode knows how to wrap game states to keep track of the actions during selection and roll-out so that RAVE can be used efficiently. So game states can be used without modification with RaveAgent.

Another thing that might be different is the semantics of the value from backup. Now, the score from the argument should be thought of as the reward for the player and action associated with that node. I think this was backwards before, which is why it needed to be negated in backup. If you play it on a 3x3 hex board with about 10 seconds of search time, you can see that this version of MCTS makes the right actions, so the signs should all be correct now.

imccarten1 commented 8 years ago

There's still a bug in how search handles rollouts and backups. Originally it always backed up the wrong players score. Now it always backs up the score of the player at the root, but backup still behaves as though the stored score is for alternating players not always the root player. Also, unexpanded nodes have acting_player set to the parent's acting player, so info_strings_to_dict and to_dict will display incorrect information for this.

There are also a few style issues:

The line outcome = None in rollout is unnecessary.
info_strings_to_dict and to_dict for RaveNode should be able to reuse the parent class's version of the same methods
test_roll_out has a print statement in it.
rave_moves would make more sense as a set rather than a dictionary since its values are always the same.
select_node doesn't really need to return the game_state when its the same game_state that's passed in.

dmorrill10 commented 8 years ago

Okay, I think I've fixed everything now. @imccarten1, want to take a look?

Removed.
Maybe, but I seem to recall that using a dict with boolean keys is better sometimes. I can't seem to find information on this, but since this patch has already taken quite a while, I won't make that change now. If you'd like, you can open an issue about it and maybe someone can get to it eventually.
It depends, maybe an inherited version of select_node will want to copy the state and update that one instead. I think it's nice that this return makes it explicit that the state was modified. But if you're not convinced you can open an issue and we can discuss it further.