Implementing MMCTS - Githubissues

uduse commented 4 years ago

We have a team of two at the University of Alberta trying to implement MMCTS (AAAI Best Paper 2018) on top of KataGo, hoping to improve KataGo's performance by improving its searching mechanism. We implemented and tested a standalone Memory class that supports the operations mentioned in the paper. See here for our current code. However, we are not exactly sure how to plug that in KataGo and would love to hear some tips from you.

Our primary questions are these (assume searching with only 1 thread for simplicity):

where's the right place to add MemoryEntry into Memory? Our guess is in Search::playoutDescend where a new child is added to node.children.
1. what's the right "value" to influence by Memory and where's the right place to do so? Our guess is that we want to change the calculation of utility in Search::recomputeNodeStats. However, statistics updates happen at so many places we just don't know how to do this whole thing correctly.

Also, we are still a bit confused about how you manage data in searching. Maybe adding an overview of the whole process would help.

Thank you for all the effort in making KataGo. It's a daunting feat.

lightvector commented 4 years ago

Fascinating, this sounds like an interesting experiment. I'm curious - what are you planning to use as the feature representation for the approximate nearby neighbors queries? The neural net GPU implementation doesn't currently return the representation in the net prior to the actual output heads (19x19 with 256 channels would be quite a lot of extra data to shuttle back from the GPU per query, but theoretically could be possible).

Leveraging a good feature representation for determining similarity of states feels like it should be very important. With modern super-strong neural nets, crude ways of doing it seem to all fail - for example RAVE (one of the crudest possible ways) is problematic, because the neural net's evaluation so strongly dominates it in quality that harms the average quality of the search. Similarly, one can find positions where neural nets are experiencing blind spots and where simple killer-move-like heuristics and would help, but again on average the neural net's policy output so strongly dominates the crude heuristic that even if in special situations the heuristic helps, overall having it is a loss of strength and search quality.

I'd love to see if a method like this could work to leverage current nets in a way that can still improve on them (as current nets very notably do have many shortcomings still!).

As for an overview of KataGo's search:

The first thing to realize is that while there is a lot of code, almost all of the accesses in the search to the node statistics are read-only. This includes 100% of searchresults.cpp and a lot of the functions in search.cpp. There are only a few places that actually modify them, having this in mind can simplify some of the thinking.

In search.h - you'll want to pay attention to mainly SearchNode, which is a node in the MCTS tree, and NodeStats which are are all the values that are updated by MCTS. The mean utility is utilitySum / weightSum, the mean estimated lead is leadSum / weightSum, etc. All values are from White's perspective.

The search itself:

The entry point for the whole core of the search is runWholeSearch.
runWholeSearch calls beginSearch which initializes everything, and creates a bunch of threads to all repeatedly call runSinglePlayout.
runSinglePlayout sets the thread's board state to that of the root, and then the thread walks down the tree via playoutDescend.
playoutDescend....
- Checks if the node is a game-terminal node by the rules, and if so, directly accumulates its value into the NodeStats stats via addLeafValue and returns.
- Checks if the node is a leaf node (no neural net eval), and if so, runs the neural net on that node, filling in node.nnOutput which also calls addLeafValue to update the NodeStats stats, and then returns.
  - There is a minor detail where if the node already has an nnOutput it may need to be recomputed - typically this only happens at the root, for example AlphaZero only adds Dirichlet noise at the root, so if a node becomes the root node due to tree reuse, we add the noise now if it's not there.
- Otherwise, calls selectBestChildToDescend to walk one step down the tree via the PUCT formula.
- And recursively calls playoutDescend on that child.
- On the way back up, it calls updateStatsAfterPlayout, which calls recomputeNodeStats, which recomputes the entire NodeStats from scratch based on the children and the stored nnOutput. So instead of incrementally adding up the utility from playouts one by one, we just iterate over all children and compute the entire sum afresh.
  - Except for visits and virtual losses, which are updated as instructed by the arguments to the function, recomputeNodeStats is idempotent. It simply sets an interior node's NodeStats to be the desired function of the node's own nnOutput together with the NodeStats of all the children.

That's basically it for the core of the search. It could use some cleanup, certainly, but hopefully the above is followable.

A remark on that last bit: why do we do recomputeNodeStats instead of using something incremental like addLeafValue everywhere? Well, if in the future we wanted to try things like Utility(node) = max_{c in children(node)} Utility(c) instead of Utility(node) = sum_{c in children} Utility(c) * Weight(c) / sum_{c in children(node)} Weight(c) or some other crazy thing here that cannot be done incrementally, we can do that here. We can make the node stats of the node to be any arbitrary desired function of the stats of all the children, instead of only things that can be computed incrementally like a weighted average.

There is actually one such thing that already makes use of this, getValueChildWeights, which downweights very low-performing children to affect the average a bit less, although last time I tested its effect now seems to be minimal, so I might remove it.

But anyways, that's the summary. Leafs get their value set through addLeafValue either due to the game being terminal (so we don't need the neural net) or when the nnOutput is computed (we need the neural net to evaluate leaf nodes that aren't game-terminal). Interior nodes of the tree get recomputed via recomputeNodeStats whenever their children change, on the "backward-pass" of playoutDescend.

Does that help? If you need more help, please feel free to email me privately (see all the commit messages for my email) and we can set up a voice chat or something! :)

uduse commented 4 years ago

Yeah, it helps a ton. Thank you for your prompt reply!

There a couple of choices for representation. We planed to start with a flattened one-hot-encoded representation of the board (see here), so it's easier to get things rolling. If that doesn't sabotage the performance too much, we planned to try using the ownership output as the representation. Both these two methods represent a board state in a much lower dimension than that mentioned in the paper, so we are not going to do the feature hashing, just use them as raw features instead. In terms of performance, we have no idea if that would work because the paper was done in Fuego (no built-in NN related stuff), and merely adding a NN in some way would improve the performance... and now we start with a strong NN... but our advisor (Martin Mueller) think it's relatively low cost to try it out on the latest tech stack so we wanted to give it a shot.

I think we will dig around a bit more, then hit you with email and set a voice chat or something. Thanks again for your help!

lightvector / KataGo

Implementing MMCTS #229