Open johanssontan opened 4 months ago
Not sure if it is a bug...
In my understanding of BitNet, training BitNet will cost more space as both int1 weights and fp16 weights (if you go for mixed precision training else fp32) are stored. It will be slim and fast for inference. 🤔
Stale issue message
I use the train.py to train the BitNet model and Base Transformer Model and have a comparison with them, I found BitNet consumes more time and space while achieveing lower loss compared to base model, which is not consistent with what the BitNet paper clamins. What could be the reason for this?
Upvote & Fund