microsoft / BitNet

Official inference framework for 1-bit LLMs
MIT License
11.47k stars 776 forks source link

Fix division-by-zero #123

Open jay-tux opened 1 week ago

jay-tux commented 1 week ago

When quantizing the (input) activations to the bit-linear layer, NaNs may occur due to division by zero. This is a consequence of the formula in the original paper: $Quant(x) = Clip(x \times \frac{Qb}{||x||\infty}, -Q_b + \epsilon, Q_b - \epsilon$

In the extreme case where all activations are zero, this will result in abs-max being zero, and thus a division by zero.

To fix this, I made sure to add 1e-10f to all maxes in the preset kernels. In 99.99% of cases, this will be a minor (or no) change, but in problematic cases, this avoids NaNs.

jay-tux commented 1 week ago

@microsoft-github-policy-service agree company="UGent"