microsoft / BitNet

Official inference framework for 1-bit LLMs
MIT License
11.39k stars 768 forks source link

Converting existing models #40

Open virentakia opened 1 month ago

virentakia commented 1 month ago

Amazing work and fantastic resource, thanks for sharing your work - this should jump start usage of llm on low resource devices.

Quick question - is there a guide to convert existing models to bitnet compliant format?!

Dead-Bytes commented 1 month ago

i tried gemma-2-27b, gemma-2-9B, and many others all worked fine no errors encountered till now, although their 1 bit quants were a lot hallucinating

dawnmsg commented 1 month ago

Unfortunately, no. If a model's weight parameters are not natively ternary, using the conversion function will result in the loss of weight values, leading to inaccurate results. We encourage more training of 1-bit models from scratch.

sean-jang00 commented 1 month ago

@Dead-Bytes By 'tried Gemma-2-27B ' do you mean that you performed QAT from scratch? How did you quantize the Gemma-2 models?

Dead-Bytes commented 1 month ago

No i did not performed QAT from Scratch, i used the present I_2S quants available for models, they show degrading performance this need to be researched to get a way out of it, however they are without ternary wieghts in 1 bit quants. They are working fine on my octacore cpu, human readable 7 tokens per second.

sean-jang00 commented 1 month ago

@dawnmsg Would training a 70B model from scratch with 1-bit precision require fewer resources than training with full precision? If similar resources are needed, would general developers still be able to perform QAT for a 70B model?

Deng-Xian-Sheng commented 1 month ago

I think it should be @ qwen2.5 developers. I hope to get a 1-bit qwen2.5 model, and I estimate they also want it.

I don't know how to train from scratch, and I estimate I won't be able to afford such financial expenses, even if I use the cloud.

We need a foundation

grctest commented 2 weeks ago

Isn't this what HF1BitLLM/Llama3-8B-1.58-100B-tokens did though? They started with a base model and fine tuned it via this method: https://huggingface.co/blog/1_58_llm_extreme_quantization

This suggests that fine-tuning the model in low-bit mode on a specific dataset causes it to lose much of its general knowledge

While pre-training models in 1.58 bits is resource-intensive, we’ve demonstrated that, with some tricks, it’s possible to fine-tune existing models to this precision level, achieving efficient performance without sacrificing accuracy.

So it can be done, but it sacrifices original model data and you need to top it up with additional datasets, so possible but not perfect.

Training it from scratch would avoid this.

Deng-Xian-Sheng commented 2 weeks ago

no help


----- 原始邮件 -----

@.***>

@.***>等3人

主题:Re: [microsoft/BitNet] Converting existing models (Issue #40)

日期:2024年11月08日 04:24:53

Isn't this what HF1BitLLM/Llama3-8B-1.58-100B-tokens did though? They started with a base model and fine tuned it via this method: https://huggingface.co/blog/1_58_llm_extreme_quantization

This suggests that fine-tuning the model in low-bit mode on a specific dataset causes it to lose much of its general knowledge

While pre-training models in 1.58 bits is resource-intensive, we’ve demonstrated that, with some tricks, it’s possible to fine-tune existing models to this precision level, achieving efficient performance without sacrificing accuracy.

So it can be done, but it sacrifices original model data and you need to top it up with additional datasets, so possible but not perfect. Training it from scratch would avoid this. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>