Midgame fine-tuning experiment

pdeblanc commented 3 years ago

I know there's no shortage of ideas for costly experiments, but what about fine-tuning a network (similar to https://lifein19x19.com/viewtopic.php?f=18&t=16995) on a few arbitrary mid-game positions?

This could establish a lower bound on how much further strength improvement is possible in principle, e.g. if the fine-tuned network is 500 elo points stronger when playing from the selected positions, then it should be possible for some future network to be 500 points stronger even from an empty board.

Friday9i commented 3 years ago

Gaining a few 100 Elo from a selected mid-game position is possible if the position is complex enough: by training on it, the net will understand it better and will become much more skilled on it. Ok, but will it learn general things/concepts useful in other mid-game positions, leading to a better overall net? That's unclear and it could also degrade the net somewhat (spending energy on learning specific things on these positions, not useful for 99.99999% of other mid-game board positions)... Regarding empty board, nets already played millions of games from it: not sure to understand how you would gain anything from it?

OmnipotentEntity commented 3 years ago

I think the poster's point isn't to create a better net immediately, but to approximate the amount of improvement left before perfect play is reached.

lightvector commented 3 years ago

The amount of improvement left to optimal, handwavily guessing in Elo space, is very likely at least thousands of Elo points. (EDIT: which, if true, is probably more than would be easy to measure by such a method, since you're limited by how much and how fast the training can climb you up that ladder, even after you specialize to just that one midgame).

Optimal is very, very far. We're not optimal yet on 9x9 from the opening position! I'm not even sure that draw rates on 9x9 are as high as they are for Chess bots yet in Chess. Maybe they're close though.

And every board size larger is exponentially harder.

lightvector commented 3 years ago

Wait, that might be a way to try to get at the distance from optimal too!

On small boards going up in a chain 6x6, 6x7, 7x7, 7x8, 8x8, 8x9, 9x9, 9x10, etc, graph the curve of how much training is necessary for the net to first hit, 75% winrate for being 0.5 komi too large, and 25% winrate for being 0.5 komi too small, from the believed perfect komi. Either using Japanese rules or button Go rules, so as to avoid the awkward 2-point scoring granularity of Chinese rules.

Also graph the curve of what factor of training is needed to go from 75% to 85, to 90%, to 95%, across those board sizes, to see if there's a consistency or pattern to it. Among the only board sizes that did manage to reach such confidence levels, of course.

Extrapolating upward from those trend lines might give a loose indicator of how far we are on 19x19.

I don't have the capacity to try myself right now, but if anyone would like to study this, the full history of networks and games from a training run specfically targeted towards small boards from 3x3 up to 15x15, including rectangles, is here: https://test.katagodistributed.org/. This study should be doable entirely using the existing nets there.

lightvector / KataGo

Midgame fine-tuning experiment #419