kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
https://discord.gg/qUtxnK2NMf
MIT License
1.55k stars 143 forks source link

Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG] #40

Closed mriganktiwari closed 4 months ago

mriganktiwari commented 6 months ago

Describe the bug When I try to replace BitLinear layer into a HF model (say Llama2-7b-chat), the size is same for both though. Shouldn't size after replacing with BitLinear layer be reduced?

Upvote & Fund

Fund with Polar

mriganktiwari commented 6 months ago

Also when I use the HF model with replaced BitLinear layers, the generations isn't working.

model_name = "meta-llama/Llama-2-7b-hf" #"bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name, token='xxxx')
model = AutoModelForCausalLM.from_pretrained(model_name, token='xxxx')

text = "Tell me about Boxing day significance."
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")
replace_linears_in_hf(model)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")
matkara commented 6 months ago

I had a quick look at this repo. In the current state of the code, it seems the binarized weights are still floats, which would explain your observation. Also it is still doing weights multiplication instead of some add / subtract, therefore not taking advantage of the replacement of the multiplication operator in bitnet1.58. This being said, performance wise (and potential bugs) apart, the results should be identical to bitnet1.58. Nice to see such attempts!

github-actions[bot] commented 4 months ago

Stale issue message