lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.6k stars 569 forks source link

Training a katago-compatible network from an SGF dataset without self-play #649

Open windo opened 2 years ago

windo commented 2 years ago

I'm looking to train a network on ~20k beginner games - to provide a sparring partner for beginners that will only think to use the kinds of moves beginners will be familiar with. This is pretty far off of what katago is striving to do, but looking at the code I think I might be able to make it work and make use of the nice engine work that has gone into katago.

I think I mostly need to create some glue code to produce finalGameData out of SGF files. All the basic board/move data would be trivial to fill in, but some things will be trickier:

  1. Obviously all kinds of NN metrics will not be there - policy/value/surprise/etc. I will need to come up with safe values to use there that wouldn't throw off the training. Or I could run the SGF games through the network being trained to get some value/policy targets, but that would be a lot of extra work and computation and it would still not be perfect.
  2. There will be some games missing final ownership and even a final score. I think the model supports setting score and ownership weights in the training data, so I should just be able to indicate this.

Any initial reactions to this idea? Other things I should look out for? Might a pull request adding something like this be accepted? Any guidance or ideas how to best do this so that it would fit into katago?

Thanks for consideration!

rooklift commented 2 years ago

If you just want to get an opponent for beginners, one of the very early KataGo networks might work.

lightvector commented 2 years ago

The idea of training KataGo networks on human games to get a set of nets that accurately predicted human play is something that I had planned myself to experiment with later this year, so this is certainly something you could try!

I would lean towards filling in all the values you can in as reasonable a way as you can,. The fields that are actually used by the engine are the policy, value, final score, lead, ownership, and (indirectly) the TD versions of the value and score as well. So at minimum you should fill in all of these, if you want the final net to be usable well with the engine. The ones not used by the engine and that are optional are the the scorebelief distribution, seki prediction, 2-dimensional scoring prediction, next turn policy, and futurepos predictions.

Filling in the ownership and final value and final score is pretty straightforward, actually. Just run a strong KataGo net on the final game position after both players pass and use that as the result and use its ownership as the final ownership. Occasionally you'll get the net predicting something weird and advanced because the players actually shouldn't have passed yet, but mostly you'll get the something equal to or close to the final result and ownership of the game, and then attempting to predict that target from earlier in game should then train the net to try to make true accurate predictions of the final game result and ownership under the assumption that the game will be played out from that earlier point to the end by kyu players.

You can see a precursor to KataGo's repo where we actually did do supervised training from human games, before switching to a full self-play loop, here https://github.com/lightvector/GoNN - but KataGo has developed far enough that the code in this repo isn't directly compatible any more. If there were a PR with some clean and well-designed code to do human game training compatible with current nets, I would certainly take a look and possibly credit and use a version of it as a baseline for my own experiments later this year. You are right that probably the main work would be to write a utility to produce the .npz files with a good construction of as many of the training targets as possible out of sgf files, compatible with the spec here https://github.com/lightvector/KataGo/blob/master/cpp/dataio/trainingwrite.h#L133

Another last thing to be aware of is that KataGo's current python code use a very outdated Tensorflow 1.15, but we're switching to Pytorch very soon, unless something very unexpected happens. https://github.com/lightvector/KataGo/tree/pytorch-rewrite

That doesn't mean you can't still use Tensorflow for your own experiments if you happen to prefer it, but almost certainly KataGo's main work is going to transition to Pytorch for the forseeable future.

windo commented 2 years ago

Thanks for the context and pointing out which fields are important! I had tinkered a little bit, but this should be very helpful for helping me focus on the fields that actually count rather than trying to figure out all of them :)