karpathy / llm.c

LLM training in simple, raw C/CUDA
MIT License
21.48k stars 2.33k forks source link

BitNet (b1.58) support #485

Open EwoutH opened 1 month ago

EwoutH commented 1 month ago

First of all, thanks. We need more ramps.

I was curious what you think of BitNet, and if llm.c is a place where experimenting with it could be facilitated. The papers were extremely promising and got a lot of traction, but there while there have been a few (small scale) reproductions yet, there isn't a easy ramp to start experimenting with it.

Papers

image

gordicaleksa commented 1 month ago

I don't think we have it on the current roadmap, Andrej can chime in. We have a lot of stuff on the backlog before we get here, including potentially supporting fp8, ZeRO stage 2, etc.

kozuch commented 1 week ago

The problem with BitNet (b1.58) training is that is still uses FP16/BF16 for training so the memory consumption does not decrease. Anyways getting support for it would be great! If used with FP8 training it could bring improvement.