lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.52k stars 563 forks source link

I'm curious about the parameters of the new training. #240

Open 22nsuk opened 4 years ago

22nsuk commented 4 years ago

Thank you for uploading a new model. I want to know the settings you used when you trained using external data. Could you make the file public? Thank you always for your hard work.

lightvector commented 4 years ago

Thanks! There isn't one single file, and also I was a bit more sloppy about recording things and being documenting what used what data at what time, since I was much more doing exploratory playing with stuff for this last little bit of the run rather than trying to get careful controlled data.

But I think a decent summary can be found here, see the section under "External-data-biased nets": https://d3dndmfyhecmj0.cloudfront.net/g170/neuralnets/README.txt

For the numeric parameters, I used all the values you can find directly in https://github.com/lightvector/KataGo/blob/master/cpp/configs/training/selfplay8b20.cfg (this is actually the parameters used for the run ever since we moved to 20-block or bigger networks).

These were the parameters for external data:

startPosesProb = <whatever proportion was used at the time, mostly 0.05, but now for the final last bit of training this week I'm dropping it to 0.01>.
startPosesLoadProb = 0.05 # Stochastically discards all but this percent of positions on this selfplay instance (of which there are 5), to save memory
startPosesTurnWeightLambda = 0.01 # Slightly biases towards early games positions.
startPosesPolicyInitAreaProp = 0.02  # Number of moves to additionally play with raw policy temperature 1, as a proportion of the board area (e.g. 361)
hintPosesProb = <whatever proportion was used at the time, 0 or 0.05 depending on which segment of the run>

The start positions used for most of this were a mix of GoGoD summer edition 2020, games 2015 or later, Fox 9d games from here from 150*.sgf to 157*.sgf, which is roughly mid 2017 to late 2019, and Leela Zero gating games from Feb 10 to April 9.

Hint positions were generated using this code in game mode as well as older versions of this code, since I was refining this code during the process, using mostly 2000 playouts at various times and whatever KataGo nets were recent at the time. Also for the very last segment documented in https://d3dndmfyhecmj0.cloudfront.net/g170/neuralnets/README.txt, I also threw in hint positions generated from https://github.com/isty2e/Baduk-test-positions using "tree-mode" (you can look at the code for the differences), as well as a small number of other blind spot positions.

Mi Yuting's flying dagger joseki training was taken by manually transcribing OGS's joseki dictionary and then adding a lot of variations including ones that KataGo liked to play, and some pointed out by other users, and then duplicating the variations for tons of corner stone configurations and permutations, using lots of SGF copy paste. And using these as start positions for games or hint positions the same way, but with startPosesPolicyInitAreaProp =0.005 to keep closer to the lines in the SGFs. The final files are quite enormous SGFS due to all the permutations, if you're morbidly curious: daggerpositions.zip


For the record, I will be the first to admit that all of the above is messy. And unfortunately it is messier still due to some bugfixes and rerunning of the hint positions generation along the way and from me accidentally turning off one of the selfplay machines for 2 days at one point due to an error trying to configure things - maybe not unlike the way LZ was buggy up all the way until LZ130ish due to lots of early figuring things out. It would be a bit better if I were to re-rerun things again.

But I consider the run up to nets in https://github.com/lightvector/KataGo/releases/tag/v1.4.0 to still be the demonstrable "clean" and principled segment of the the run, purely self-play driven, and then all of this after that is just me fooling around with the last bit of the run to get some intuition for what effect a bunch of different things have, rather than attempting to demonstrate or prove any sort of clean or trustworthy experiment.

I also don't think this is a long-term ideal way forward. The "pure" segment of the run already reached this level of strength without any of the above messiness and then adding it didn't seem to actually affect strength all that much, just fixing some blind spots that each individually are all very rare in play except for flying dagger. Which is certainly nice of course. I am currently pondering possible better and much more fundamental ways...

yzyray commented 4 years ago

I wonder if the Final Neural Networks continues from Experimental Neural Nets or skip Experimental Neural Nets continues from 10 May's nets

lightvector commented 4 years ago

They continue. This should be mentioned here https://github.com/lightvector/KataGo#current-status-and-history and https://d3dndmfyhecmj0.cloudfront.net/g170/neuralnets/README.txt once you get to parts of the tables describing the LR drop nets. Let me know if this was not clear. Thanks!

yzyray commented 4 years ago

They continue. This should be mentioned here https://github.com/lightvector/KataGo#current-status-and-history and https://d3dndmfyhecmj0.cloudfront.net/g170/neuralnets/README.txt once you get to parts of the tables describing the LR drop nets. Let me know if this was not clear. Thanks!

Thanks for detaied replay