Switched bucketed arch to (768x4 -> 1024)x2 -> 8

This network still doesn't gain over the main network, but does significantly better than the previous bucketed one.

Elo   | 35.41 +- 11.00 (95%)
SPRT  | 8.0+0.08s Threads=1 Hash=16MB
LLR   | 2.97 (-2.94, 2.94) [0.00, 3.00]
Games | N: 2156 W: 712 L: 493 D: 951
Penta | [20, 186, 480, 339, 53]
http://somelizard.pythonanywhere.com/test/517/

Elo   | 23.99 +- 10.95 (95%)
Conf  | 40.0+0.40s Threads=1 Hash=32MB
Games | N: 2002 W: 586 L: 448 D: 968
Penta | [10, 175, 497, 305, 14]
http://somelizard.pythonanywhere.com/test/516/

Used the following training regimen:

Arch                   : (768x4 -> 1024)x2 -> 8
Scale                  : 400
1 / FT Regularisation  : 4194304
Batch Size             : 16384
Batches / Superbatch   : 6104
Positions / Superbatch : 100007936
End Superbatch         : 1000
WDL Scheduler          : constant 0
LR Scheduler           : start 0.001 gamma 0.4 drop every 60 superbatches

With a dataset consisting of:

2 billion positions from: T80 July
3 billion positions from: T80 August

liamt19 / Lizard

Switched bucketed arch to (768x4 -> 1024)x2 -> 8 #14