Open virentakia opened 1 month ago
i tried gemma-2-27b, gemma-2-9B, and many others all worked fine no errors encountered till now, although their 1 bit quants were a lot hallucinating
Unfortunately, no. If a model's weight parameters are not natively ternary, using the conversion function will result in the loss of weight values, leading to inaccurate results. We encourage more training of 1-bit models from scratch.
@Dead-Bytes By 'tried Gemma-2-27B ' do you mean that you performed QAT from scratch? How did you quantize the Gemma-2 models?
No i did not performed QAT from Scratch, i used the present I_2S quants available for models, they show degrading performance this need to be researched to get a way out of it, however they are without ternary wieghts in 1 bit quants. They are working fine on my octacore cpu, human readable 7 tokens per second.
@dawnmsg Would training a 70B model from scratch with 1-bit precision require fewer resources than training with full precision? If similar resources are needed, would general developers still be able to perform QAT for a 70B model?
I think it should be @ qwen2.5 developers. I hope to get a 1-bit qwen2.5 model, and I estimate they also want it.
I don't know how to train from scratch, and I estimate I won't be able to afford such financial expenses, even if I use the cloud.
We need a foundation
Isn't this what HF1BitLLM/Llama3-8B-1.58-100B-tokens did though? They started with a base model and fine tuned it via this method: https://huggingface.co/blog/1_58_llm_extreme_quantization
This suggests that fine-tuning the model in low-bit mode on a specific dataset causes it to lose much of its general knowledge
While pre-training models in 1.58 bits is resource-intensive, we’ve demonstrated that, with some tricks, it’s possible to fine-tune existing models to this precision level, achieving efficient performance without sacrificing accuracy.
So it can be done, but it sacrifices original model data and you need to top it up with additional datasets, so possible but not perfect.
Training it from scratch would avoid this.
no help
----- 原始邮件 -----
@.***>
@.***>等3人
主题:Re: [microsoft/BitNet] Converting existing models (Issue #40)
日期:2024年11月08日 04:24:53
Isn't this what HF1BitLLM/Llama3-8B-1.58-100B-tokens did though? They started with a base model and fine tuned it via this method: https://huggingface.co/blog/1_58_llm_extreme_quantization
This suggests that fine-tuning the model in low-bit mode on a specific dataset causes it to lose much of its general knowledge
While pre-training models in 1.58 bits is resource-intensive, we’ve demonstrated that, with some tricks, it’s possible to fine-tune existing models to this precision level, achieving efficient performance without sacrificing accuracy.
So it can be done, but it sacrifices original model data and you need to top it up with additional datasets, so possible but not perfect. Training it from scratch would avoid this. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Amazing work and fantastic resource, thanks for sharing your work - this should jump start usage of llm on low resource devices.
Quick question - is there a guide to convert existing models to bitnet compliant format?!