Train another network, starting from bexp21-e1500 for another 910 epochs. Data includes 3B additional fens from Berserk 20230811 swapped in. Resulting network is bexp23-e910.
This network also has it's weights sorted to prioritizing chunking of activated neurons. This reduced 4-chunk sparsity from 14.6% to 13% and showed a small speedup of ~0.75% (bexp23-e910a).
Last, this patch also resolves an issue with FT values going over int8_t max of 127. This was previously okay for normal matrix multiplication, but fails outright with sparse matmul. The FT activation is now effectively CReLU with a max of 127 which does result in an incorrect clamping of a value ~0.000001% of the time (I can live with that for now).
Bench: 4856801
Train another network, starting from bexp21-e1500 for another 910 epochs. Data includes 3B additional fens from Berserk 20230811 swapped in. Resulting network is bexp23-e910.
This network also has it's weights sorted to prioritizing chunking of activated neurons. This reduced 4-chunk sparsity from 14.6% to 13% and showed a small speedup of ~0.75% (bexp23-e910a).
Last, this patch also resolves an issue with FT values going over
int8_t
max of 127. This was previously okay for normal matrix multiplication, but fails outright with sparse matmul. The FT activation is now effectively CReLU with a max of 127 which does result in an incorrect clamping of a value ~0.000001% of the time (I can live with that for now).ELO | 5.64 +- 4.24 (95%) SPRT | 8.0+0.08s Threads=1 Hash=8MB LLR | 2.94 (-2.94, 2.94) [-2.50, 0.50] GAMES | N: 12200 W: 2987 L: 2789 D: 6424 http://chess.grantnet.us/test/33555/
ELO | 7.64 +- 4.78 (95%) SPRT | 40.0+0.40s Threads=1 Hash=64MB LLR | 2.96 (-2.94, 2.94) [-2.50, 0.50] GAMES | N: 9232 W: 2208 L: 2005 D: 5019 http://chess.grantnet.us/test/33560/