kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
https://discord.gg/qUtxnK2NMf
MIT License
1.56k stars 145 forks source link

[BUG]multi-head attention is noop for BITLINEAR #24

Closed Bsdnbo closed 7 months ago

Bsdnbo commented 7 months ago

Describe the bug A clear and concise description of what the bug is and what the main root cause error is. Test very thoroughly before submitting.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

Upvote & Fund

Fund with Polar

kyegomez commented 7 months ago

@Bsdnbo bit attention has been created, is this solved?