Closed liamt19 closed 7 months ago
This network still doesn't gain over the main network, but does significantly better than the previous bucketed one.
Elo | 35.41 +- 11.00 (95%) SPRT | 8.0+0.08s Threads=1 Hash=16MB LLR | 2.97 (-2.94, 2.94) [0.00, 3.00] Games | N: 2156 W: 712 L: 493 D: 951 Penta | [20, 186, 480, 339, 53] http://somelizard.pythonanywhere.com/test/517/
Elo | 23.99 +- 10.95 (95%) Conf | 40.0+0.40s Threads=1 Hash=32MB Games | N: 2002 W: 586 L: 448 D: 968 Penta | [10, 175, 497, 305, 14] http://somelizard.pythonanywhere.com/test/516/
Used the following training regimen:
Arch : (768x4 -> 1024)x2 -> 8 Scale : 400 1 / FT Regularisation : 4194304 Batch Size : 16384 Batches / Superbatch : 6104 Positions / Superbatch : 100007936 End Superbatch : 1000 WDL Scheduler : constant 0 LR Scheduler : start 0.001 gamma 0.4 drop every 60 superbatches
With a dataset consisting of:
This network still doesn't gain over the main network, but does significantly better than the previous bucketed one.
Used the following training regimen:
With a dataset consisting of: