lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.42k stars 563 forks source link

Implementing MMCTS #229

Open uduse opened 4 years ago

uduse commented 4 years ago

We have a team of two at the University of Alberta trying to implement MMCTS (AAAI Best Paper 2018) on top of KataGo, hoping to improve KataGo's performance by improving its searching mechanism. We implemented and tested a standalone Memory class that supports the operations mentioned in the paper. See here for our current code. However, we are not exactly sure how to plug that in KataGo and would love to hear some tips from you.

Our primary questions are these (assume searching with only 1 thread for simplicity):

  1. where's the right place to add MemoryEntry into Memory? Our guess is in Search::playoutDescend where a new child is added to node.children.
    1. what's the right "value" to influence by Memory and where's the right place to do so? Our guess is that we want to change the calculation of utility in Search::recomputeNodeStats. However, statistics updates happen at so many places we just don't know how to do this whole thing correctly.

Also, we are still a bit confused about how you manage data in searching. Maybe adding an overview of the whole process would help.

Thank you for all the effort in making KataGo. It's a daunting feat.

lightvector commented 4 years ago

Fascinating, this sounds like an interesting experiment. I'm curious - what are you planning to use as the feature representation for the approximate nearby neighbors queries? The neural net GPU implementation doesn't currently return the representation in the net prior to the actual output heads (19x19 with 256 channels would be quite a lot of extra data to shuttle back from the GPU per query, but theoretically could be possible).

Leveraging a good feature representation for determining similarity of states feels like it should be very important. With modern super-strong neural nets, crude ways of doing it seem to all fail - for example RAVE (one of the crudest possible ways) is problematic, because the neural net's evaluation so strongly dominates it in quality that harms the average quality of the search. Similarly, one can find positions where neural nets are experiencing blind spots and where simple killer-move-like heuristics and would help, but again on average the neural net's policy output so strongly dominates the crude heuristic that even if in special situations the heuristic helps, overall having it is a loss of strength and search quality.

I'd love to see if a method like this could work to leverage current nets in a way that can still improve on them (as current nets very notably do have many shortcomings still!).

As for an overview of KataGo's search:

The first thing to realize is that while there is a lot of code, almost all of the accesses in the search to the node statistics are read-only. This includes 100% of searchresults.cpp and a lot of the functions in search.cpp. There are only a few places that actually modify them, having this in mind can simplify some of the thinking.

In search.h - you'll want to pay attention to mainly SearchNode, which is a node in the MCTS tree, and NodeStats which are are all the values that are updated by MCTS. The mean utility is utilitySum / weightSum, the mean estimated lead is leadSum / weightSum, etc. All values are from White's perspective.

The search itself:

That's basically it for the core of the search. It could use some cleanup, certainly, but hopefully the above is followable.

A remark on that last bit: why do we do recomputeNodeStats instead of using something incremental like addLeafValue everywhere? Well, if in the future we wanted to try things like Utility(node) = max_{c in children(node)} Utility(c) instead of Utility(node) = sum_{c in children} Utility(c) * Weight(c) / sum_{c in children(node)} Weight(c) or some other crazy thing here that cannot be done incrementally, we can do that here. We can make the node stats of the node to be any arbitrary desired function of the stats of all the children, instead of only things that can be computed incrementally like a weighted average.

There is actually one such thing that already makes use of this, getValueChildWeights, which downweights very low-performing children to affect the average a bit less, although last time I tested its effect now seems to be minimal, so I might remove it.

But anyways, that's the summary. Leafs get their value set through addLeafValue either due to the game being terminal (so we don't need the neural net) or when the nnOutput is computed (we need the neural net to evaluate leaf nodes that aren't game-terminal). Interior nodes of the tree get recomputed via recomputeNodeStats whenever their children change, on the "backward-pass" of playoutDescend.

Does that help? If you need more help, please feel free to email me privately (see all the commit messages for my email) and we can set up a voice chat or something! :)

uduse commented 4 years ago

Yeah, it helps a ton. Thank you for your prompt reply!

There a couple of choices for representation. We planed to start with a flattened one-hot-encoded representation of the board (see here), so it's easier to get things rolling. If that doesn't sabotage the performance too much, we planned to try using the ownership output as the representation. Both these two methods represent a board state in a much lower dimension than that mentioned in the paper, so we are not going to do the feature hashing, just use them as raw features instead. In terms of performance, we have no idea if that would work because the paper was done in Fuego (no built-in NN related stuff), and merely adding a NN in some way would improve the performance... and now we start with a strong NN... but our advisor (Martin Mueller) think it's relatively low cost to try it out on the latest tech stack so we wanted to give it a shot.

I think we will dig around a bit more, then hit you with email and set a voice chat or something. Thanks again for your help!