Open duany049 opened 6 months ago
Hi @duany049, we have moved our quantization framework into PEFT.
You can use the command here to obtain 2bit weights: https://github.com/yxli2123/LoftQ/tree/main#apply-loftq-and-save. Just change to --bits 2
.
Keep in mind that we only provide 2-bit equivalent fp16 weights because 2-bit backend is not supported by bitsandbytes. If you have limited resource, we suggest you load the 2-bit equivalent fp16 weights in 4 bit by bitsandbytes, which saves 75% GPU compared to fp16.
Thanks for you reply. I changed --bits from 4 to 2 as you said. But the following exception was thrown.
File "/data2/duan/miniconda3/envs/loftq/lib/python3.11/site-packages/peft/utils/loftq_utils.py", line 215, in loftq_init
quantized_weight, max_abs, shape = quantizer.quantize_block(res)
^^^^^^^^^
UnboundLocalError: cannot access local variable 'quantizer' where it is not associated with a value
I fixed the problem by adding an new condition: num_bits == 2 in line 201, below is the code:
if not is_bnb_4bit_available() or num_bits == 2:
quantizer = NFQuantizer(num_bits=num_bits, device=device, method="normal", block_size=64)
Is my modification correct? Do I need to submit the code?
Hi @duany049, we have moved our quantization framework into PEFT.
You can use the command here to obtain 2bit weights: https://github.com/yxli2123/LoftQ/tree/main#apply-loftq-and-save. Just change to
--bits 2
.Keep in mind that we only provide 2-bit equivalent fp16 weights because 2-bit backend is not supported by bitsandbytes. If you have limited resource, we suggest you load the 2-bit equivalent fp16 weights in 4 bit by bitsandbytes, which saves 75% GPU compared to fp16.
I have fineturned 2bits llama2-7b with fakequantization, could I merge the adapter and 2bit model to a 2bit merged model?
Hi @duany049, please install the up-to-date peft by pip install git+https://github.com/huggingface/peft.git
. This issue has been resolved in the up-to-date version. https://github.com/huggingface/peft/blob/main/src/peft/utils/loftq_utils.py#L201
Hi @duany049, we have moved our quantization framework into PEFT.
You can use the command here to obtain 2bit weights: https://github.com/yxli2123/LoftQ/tree/main#apply-loftq-and-save. Just change to
--bits 2
.Keep in mind that we only provide 2-bit equivalent fp16 weights because 2-bit backend is not supported by bitsandbytes. If you have limited resource, we suggest you load the 2-bit equivalent fp16 weights in 4 bit by bitsandbytes, which saves 75% GPU compared to fp16.
Thank you for your reply. I have an another question:
I found the implementation of 4-bit quantified , but I couldn't find a 2-bit one. Can you tell me how to implement a finereturn for a 2-bit quantization model