Closed thegodone closed 7 months ago
I would love to see the mlx example for BitNet as well, but I would be very cautious about using references from unofficial implementations, especially those from keygomez. Just a heads up: https://www.reddit.com/r/LocalLLaMA/comments/15spxn3/potential_scammer_on_github/
I don't think we are going to add this to MLX examples. It's still a bit niche. I would love to see a community contribution though!
@awni would you be able to point me in the right direction for how I would think about doing this? Basically the key is being able to support the linear layer instead being either binary or ternary.
@RonanKMcGovern you have two options:
Maybe you could say more about your goals here though.. in either case training will probably be a lot slower (but the first case would be way slower).
Thanks Awni, probably my goals were ill-conceived.
Seeing the BitNet and 1.58 papers, I had thought there could be merit - both for a) reducing VRAM and b) reducing FLOPS - in using smaller 1-2 bit kernels.
However, it appears that:
On Wed, Mar 20, 2024 at 5:03 AM Awni Hannun @.***> wrote:
@RonanKMcGovern https://github.com/RonanKMcGovern you have two options:
- Simulate the bitnet ops with casting and quantization / dequantization before matmuls.
- Implement the quantized kernels themselves with custom gradients
Maybe you could say more about your goals here though.. in either case training will probably be a lot slower (but the first case would be way slower).
— Reply to this email directly, view it on GitHub https://github.com/ml-explore/mlx-examples/issues/512#issuecomment-2008664828, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CWKFAAFSAYIUW4D4CTYZEKBJAVCNFSM6AAAAABEBQMXQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGY3DIOBSHA . You are receiving this because you were mentioned.Message ID: @.***>
Yea I agree with your assessment
Can you add an example of BitNet from Microsoft : https://github.com/kyegomez/BitNet ?