Open DRAn-An opened 1 month ago
Thanks for the question. T-Mac introduces the lookup table methods for low bits model inference, which is generally capable for models such as 1-bit, 2-bits, 4 bits and so on, and the look up table contains values grouped of 2^1, 2^2 and 2^4. On the other hand, Bitnet is a ternary weights model, that every weight has three possible values -1 0 1, which makes it possible to group the values by 3^n, to further reduce the model size to nearly b1.58 log2(3). so in TL1 and TL2 the values are specifically grouped with ternary weights to achieve better performance. Another issue is that we found only bitnet kernels can output exact same tokens due to lossless inference compared to fp32 format inference. Detailed explanations can be found via https://arxiv.org/pdf/[2410.16144](https://arxiv.org/pdf/2410.16144)
Hello, After thoroughly reviewing the source code of both BitNet and T-MAC, I noticed a high degree of overlap between the two. The code implementation seems quite similar, which raises some questions for me: What are the specific differences between BitNet and T-MAC in terms of architecture, algorithms, or optimization strategies? Are there any unique improvements or distinct use cases for each? I would appreciate it if you could clarify the distinctions between them.