Fix division-by-zero - Githubissues

When quantizing the (input) activations to the bit-linear layer, NaNs may occur due to division by zero. This is a consequence of the formula in the original paper: $Quant(x) = Clip(x \times \frac{Qb}{||x||\infty}, -Q_b + \epsilon, Q_b - \epsilon$

In the extreme case where all activations are zero, this will result in abs-max being zero, and thus a division by zero.

To fix this, I made sure to add 1e-10f to all maxes in the preset kernels. In 99.99% of cases, this will be a minor (or no) change, but in problematic cases, this avoids NaNs.

microsoft / BitNet

Fix division-by-zero #123