Closed dmorrill10 closed 8 years ago
There's still a bug in how search handles rollouts and backups. Originally it always backed up the wrong players score. Now it always backs up the score of the player at the root, but backup still behaves as though the stored score is for alternating players not always the root player. Also, unexpanded nodes have acting_player set to the parent's acting player, so info_strings_to_dict and to_dict will display incorrect information for this.
There are also a few style issues:
Okay, I think I've fixed everything now. @imccarten1, want to take a look?
select_node
will want to copy the state and update that one instead. I think it's nice that this return makes it explicit that the state was modified. But if you're not convinced you can open an issue and we can discuss it further.
@imccarten1, I've refactored the MCTS related classes a bit and I think it's quite a bit cleaner now. I've also merged your RAVE implementation from
rave
into these changes. Do you want to look over these changes?One thing I changed from your RAVE implementation is that now
RaveNode
knows how to wrap game states to keep track of the actions during selection and roll-out so that RAVE can be used efficiently. So game states can be used without modification withRaveAgent
.Another thing that might be different is the semantics of the value from
backup
. Now, the score from the argument should be thought of as the reward for the player and action associated with that node. I think this was backwards before, which is why it needed to be negated inbackup
. If you play it on a 3x3 hex board with about 10 seconds of search time, you can see that this version of MCTS makes the right actions, so the signs should all be correct now.