Significant code overlap between BitNet and T-MAC, what are the specific differences?

microsoft / BitNet

Official inference framework for 1-bit LLMs

MIT License

11.43k stars 773 forks source link

Thanks for the question. T-Mac introduces the lookup table methods for low bits model inference, which is generally capable for models such as 1-bit, 2-bits, 4 bits and so on, and the look up table contains values grouped of 2^1, 2^2 and 2^4. On the other hand, Bitnet is a ternary weights model, that every weight has three possible values -1 0 1, which makes it possible to group the values by 3^n, to further reduce the model size to nearly b1.58 log2(3). so in TL1 and TL2 the values are specifically grouped with ternary weights to achieve better performance. Another issue is that we found only bitnet kernels can output exact same tokens due to lossless inference compared to fp32 format inference. Detailed explanations can be found via https://arxiv.org/pdf/[2410.16144](https://arxiv.org/pdf/2410.16144)

microsoft / BitNet

Significant code overlap between BitNet and T-MAC, what are the specific differences? #69