Error when Using 8-bit Quantization

JhonDan1999 commented 5 months ago

I am encountering a data type mismatch error when using 8-bit quantization with the PEFT library and SFTTrainer for fine-tuning a language model. The error occurs during the generation phase after loading the fine-tuned model.

Here's an overview of my workflow:

I fine-tuned a base model using the SFTTrainer from the TextRL library.
After fine-tuning, I saved the adapter using PEFT.
I loaded the fine-tuned model using PEFT and the BitsAndBytesConfig for 8-bit quantization.
I merged the adapter with the base model using merge_and_unload().
During the generation phase, I encountered a data type mismatch error.

Here's the code snippet for loading the fine-tuned model:

from peft import PeftConfig
config = PeftConfig.from_pretrained(peft_model_output_dir)

from transformers import AutoConfig, AutoModelForCausalLM

bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4"
)

base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    quantization_config=bnb_config,
    device_map='auto',
)

ft_model = PeftModel.from_pretrained(base_model, peft_model_output_dir)
ft_model = ft_model.merge_and_unload()

The error message I'm getting is:

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != signed char

I have also printed the data types of the model parameters, and they appear to be a mix of torch.float16 and torch.int8, which is expected when using 8-bit quantization.

I would appreciate any guidance why am facing this issue

NOTE: this is you did not appear to me when I load the model in 4bits but after the fine-tuning I want to use the model in 8bits to get better accuracy (please correct me if my hypothesis is not correct)

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

JhonDan1999 commented 4 months ago

this still needs to be addressed

younesbelkada commented 3 months ago

Hi Thanks for the issue and apologies for the delay, what peft version are you using?

JhonDan1999 commented 3 months ago

Hi @younesbelkada it is peft version: 0.11.1

younesbelkada commented 3 months ago

thanks! Can you print the model after merging it ? Alternatively can you share a model that we can look into on the Hub?

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

huggingface / trl

Error when Using 8-bit Quantization #1616