kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
https://discord.gg/qUtxnK2NMf
MIT License
1.69k stars 155 forks source link

Encountering Size Mismatch Error in Updated Code #45

Closed anonymousA123 closed 5 months ago

anonymousA123 commented 8 months ago

While running the updated code, I encountered an issue as follows: 1

I would greatly appreciate any guidance or assistance you can provide to help resolve this issue.

Upvote & Fund

Fund with Polar

2020zyc commented 8 months ago

it seems self.gamma and self.beta all need squeeze, i.e. self.gamma.squeeze()

AnonymousA12345 commented 8 months ago

Thank you so much for your reply. May I ask if you have encountered the following error message?? Because when I add squeeze to self.gamma and self.beta, it reports an error: 1 Additionally, does training with the latest version of the code still generate gibberish messages like in issue #23?

github-actions[bot] commented 6 months ago

Stale issue message