I did this in the winter and spring when temperatures were cooler. I trained it with KataGo over a period of few months starting with V100 on Google Cloud and later with a single 2060
I found that using cycling learning rates got me better gains on a single GPU because learning took only 1/30 of the time of self-play so doubling or tripling learning time to make slightly better progress each network was always worth it
My learning ratios were 0.6x, 0.1x, 0.03x
Current strength is around g170-s6682 or so at 9x9, so weaker than the latest neworks and weaker than hzy's 9x9 networks, but already stronger than a lot of the 9x9 leela zero bots
I did this in the winter and spring when temperatures were cooler. I trained it with KataGo over a period of few months starting with V100 on Google Cloud and later with a single 2060
I found that using cycling learning rates got me better gains on a single GPU because learning took only 1/30 of the time of self-play so doubling or tripling learning time to make slightly better progress each network was always worth it
My learning ratios were 0.6x, 0.1x, 0.03x
Current strength is around g170-s6682 or so at 9x9, so weaker than the latest neworks and weaker than hzy's 9x9 networks, but already stronger than a lot of the 9x9 leela zero bots
It's a 20b 256f network
https://mega.nz/file/tNJzCSzK#p2OlKCEVLrLi_SBuWpfET__t2fkwTnO20imVQadnZHo
I'm currently working on other things, so I'm going to drop this network off if anyone is interested