Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG]

mriganktiwari commented 6 months ago

Describe the bug When I try to replace BitLinear layer into a HF model (say Llama2-7b-chat), the size is same for both though. Shouldn't size after replacing with BitLinear layer be reduced?

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

mriganktiwari commented 6 months ago

Also when I use the HF model with replaced BitLinear layers, the generations isn't working.

The .generate with Llama2 model, completes generation in ~68 seconds
Whereas, doing same after replacing the BitLinear layers, keeps it running for eternity

model_name = "meta-llama/Llama-2-7b-hf" #"bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name, token='xxxx')
model = AutoModelForCausalLM.from_pretrained(model_name, token='xxxx')

text = "Tell me about Boxing day significance."
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")

replace_linears_in_hf(model)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")

matkara commented 6 months ago

I had a quick look at this repo. In the current state of the code, it seems the binarized weights are still floats, which would explain your observation. Also it is still doing weights multiplication instead of some add / subtract, therefore not taking advantage of the replacement of the multiplication operator in bitnet1.58. This being said, performance wise (and potential bugs) apart, the results should be identical to bitnet1.58. Nice to see such attempts!

github-actions[bot] commented 4 months ago

Stale issue message

kyegomez / BitNet

Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG] #40

Upvote & Fund