About GPU memory usage - Githubissues

Hello. First of all, thanks for sharing a bitnet training code.

I have a question about GPU memory usage. As I understanding, bitnet can reduce VRAM usage compared to fp16/bf16 precision. However, by commenting code in the train_bitnet.py model = apply_bitlinear(model, target_layers=target_layers) # comment this to train og llama memory usage is reduced about 2G. (with bitnet layer, it used 13G v.s. w/o bitnet layer, 11G)

Doesn't it make sense that using bitnet would actually result in lower memory usage?

Thanks.

joey00072 / ohara

About GPU memory usage #8