BitNet (b1.58) support - Githubissues

First of all, thanks. We need more ramps.

I was curious what you think of BitNet, and if llm.c is a place where experimenting with it could be facilitated. The papers were extremely promising and got a lot of traction, but there while there have been a few (small scale) reproductions yet, there isn't a easy ramp to start experimenting with it.

Papers

BitNet: Scaling 1-bit Transformers for Large Language Models
(BitNet b1.58)) The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
The Era of 1-bit LLMs: Training Tips, Code and FAQ

I don't think we have it on the current roadmap, Andrej can chime in. We have a lot of stuff on the backlog before we get here, including potentially supporting fp8, ZeRO stage 2, etc.

The problem with BitNet (b1.58) training is that is still uses FP16/BF16 for training so the memory consumption does not decrease. Anyways getting support for it would be great! If used with FP8 training it could bring improvement.

karpathy / llm.c

BitNet (b1.58) support #485