kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
https://discord.gg/qUtxnK2NMf
MIT License
1.69k stars 155 forks source link

Fixed shape of beta and gamma for proper broadcasting #37

Closed dariocazzani closed 8 months ago

dariocazzani commented 8 months ago

Fixed shape of beta and gamma for proper broadcasting