joey00072 / ohara

Collection of autoregressive model implementation
62 stars 5 forks source link

About GPU memory usage #8

Open JY-CCK opened 6 months ago

JY-CCK commented 6 months ago

Hello. First of all, thanks for sharing a bitnet training code.

I have a question about GPU memory usage. As I understanding, bitnet can reduce VRAM usage compared to fp16/bf16 precision. However, by commenting code in the train_bitnet.py model = apply_bitlinear(model, target_layers=target_layers) # comment this to train og llama memory usage is reduced about 2G. (with bitnet layer, it used 13G v.s. w/o bitnet layer, 11G)

Doesn't it make sense that using bitnet would actually result in lower memory usage?

Thanks.

joey00072 commented 6 months ago

Oh is it, lmk