kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
https://discord.gg/qUtxnK2NMf
MIT License
1.56k stars 145 forks source link

Fix: Weight quantization sign should be the last operation #59

Closed jmbrito01 closed 1 day ago

jmbrito01 commented 3 months ago

Description: This PR fixes a bug in the weight_quant() function, where the weights do not follow the paper orientation to when to sign the weights. Basically the sign operation should be the last thing to be done but it's being done before the multiplication of the scale(see this for more informations)

For discussion: The weights and activation are still being done at full precision on BitLinear, it's probably best to convert it to int8 like the paper proposes to provide better speed.

Issue: #52

github-actions[bot] commented 1 week ago

Stale pull request message