Along with the above, this patch also improves the speed slightly of sparse matrix multiplication, bringing a concept over from the regular matrix multiplication, which minimizes the expansion of int16_t -> int32_t by adding them together in pairs. This tested slightly faster and showed no bench variation (checked up to depth 24). It is also unlikely this causes an overflow in Berserk as the ReLU architecture doesn't clamp values at int8_t max like other engines who may experience this issue.
Bench: 4611949
In order to prevent
int16_t
overflows, this new network (the same as the previous) is quantized with 32 in the input layer.ELO | 0.61 +- 1.95 (95%) SPRT | 6.0+0.06s Threads=1 Hash=8MB LLR | 2.95 (-2.94, 2.94) [-2.50, 0.50] GAMES | N: 58472 W: 14105 L: 14003 D: 30364 http://chess.grantnet.us/test/33600/
Along with the above, this patch also improves the speed slightly of sparse matrix multiplication, bringing a concept over from the regular matrix multiplication, which minimizes the expansion of
int16_t
->int32_t
by adding them together in pairs. This tested slightly faster and showed no bench variation (checked up to depth 24). It is also unlikely this causes an overflow in Berserk as the ReLU architecture doesn't clamp values atint8_t
max like other engines who may experience this issue.Worth noting there is no improvement going to 4x