yxli2123 / LoftQ

MIT License
180 stars 15 forks source link

how to train with 2bit quantization model? #10

Open duany049 opened 6 months ago

duany049 commented 6 months ago

I found the implementation of 4-bit quantified , but I couldn't find a 2-bit one. Can you tell me how to implement a finereturn for a 2-bit quantization model

yxli2123 commented 6 months ago

Hi @duany049, we have moved our quantization framework into PEFT.

You can use the command here to obtain 2bit weights: https://github.com/yxli2123/LoftQ/tree/main#apply-loftq-and-save. Just change to --bits 2.

Keep in mind that we only provide 2-bit equivalent fp16 weights because 2-bit backend is not supported by bitsandbytes. If you have limited resource, we suggest you load the 2-bit equivalent fp16 weights in 4 bit by bitsandbytes, which saves 75% GPU compared to fp16.

duany049 commented 6 months ago

Thanks for you reply. I changed --bits from 4 to 2 as you said. But the following exception was thrown.

  File "/data2/duan/miniconda3/envs/loftq/lib/python3.11/site-packages/peft/utils/loftq_utils.py", line 215, in loftq_init
    quantized_weight, max_abs, shape = quantizer.quantize_block(res)
                                       ^^^^^^^^^
UnboundLocalError: cannot access local variable 'quantizer' where it is not associated with a value

I fixed the problem by adding an new condition: num_bits == 2 in line 201, below is the code:

    if not is_bnb_4bit_available() or num_bits == 2:
        quantizer = NFQuantizer(num_bits=num_bits, device=device, method="normal", block_size=64)

Is my modification correct? Do I need to submit the code?

duany049 commented 6 months ago

Hi @duany049, we have moved our quantization framework into PEFT.

You can use the command here to obtain 2bit weights: https://github.com/yxli2123/LoftQ/tree/main#apply-loftq-and-save. Just change to --bits 2.

Keep in mind that we only provide 2-bit equivalent fp16 weights because 2-bit backend is not supported by bitsandbytes. If you have limited resource, we suggest you load the 2-bit equivalent fp16 weights in 4 bit by bitsandbytes, which saves 75% GPU compared to fp16.

I have fineturned 2bits llama2-7b with fakequantization, could I merge the adapter and 2bit model to a 2bit merged model?

yxli2123 commented 6 months ago

Hi @duany049, please install the up-to-date peft by pip install git+https://github.com/huggingface/peft.git. This issue has been resolved in the up-to-date version. https://github.com/huggingface/peft/blob/main/src/peft/utils/loftq_utils.py#L201

duany049 commented 6 months ago

Hi @duany049, we have moved our quantization framework into PEFT.

You can use the command here to obtain 2bit weights: https://github.com/yxli2123/LoftQ/tree/main#apply-loftq-and-save. Just change to --bits 2.

Keep in mind that we only provide 2-bit equivalent fp16 weights because 2-bit backend is not supported by bitsandbytes. If you have limited resource, we suggest you load the 2-bit equivalent fp16 weights in 4 bit by bitsandbytes, which saves 75% GPU compared to fp16.

Thank you for your reply. I have an another question:

  1. Could I load the 2-bit equivalent fp16 weights in 2 bit by AutoGPTQ
  2. Will it help me save 87.5% of GPU compared to fp16?
yxli2123 commented 6 months ago
  1. No, because I don't think AutoGPTQ and NF2 (a variant version of NF4) use the same quantization function.
  2. No, since it uses NF4 on GPU. It can only save up to 75% of GPU compared to fp16, even if the values are mathematically equivalent to 2-bit values.