unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.97k stars 1.08k forks source link

AutoModelForSequenceClassification #175

Open asmith26 opened 7 months ago

asmith26 commented 7 months ago

Hi, I'm training a model (essentially copied from https://huggingface.co/blog/unsloth-trl#unsloth--trl-integration):

import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset
from unsloth import FastLanguageModel

max_seq_length = 2048  # Supports RoPE Scaling interally, so choose any!
# Get dataset
dataset = load_dataset("imdb", split="train")

# Load Llama model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/mistral-7b-bnb-4bit",  # Supports Llama, Mistral - replace this!
    max_seq_length=max_seq_length,
    dtype=None,
    load_in_4bit=True,
)

# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj", ],
    lora_alpha=16,
    lora_dropout=0,  # Supports any, but = 0 is optimized
    bias="none",  # Supports any, but = "none" is optimized
    use_gradient_checkpointing=True,
    random_state=3407,
    max_seq_length=max_seq_length,
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=10,
        max_steps=60,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        output_dir="./outputs",
        optim="adamw_8bit",
        seed=3407,
    ),
)

trainer_stats = trainer.train()
model.save_pretrained("mistral-finetuned-gpu")

How can I now load this model locally? I'm trying:

from unsloth import FastLanguageModel

model_directory = "./mistral-finetuned-gpu"
model = FastLanguageModel.from_pretrained(model_directory)

unfortunately this yields:

Traceback (most recent call last):
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-d56bc0fcc723>", line 4, in <module>
    model = FastLanguageModel.from_pretrained(model_directory)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/loader.py", line 68, in from_pretrained
    model_config = AutoConfig.from_pretrained(model_name)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1094, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/transformers/configuration_utils.py", line 644, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/transformers/configuration_utils.py", line 699, in _get_config_dict
    resolved_config_file = cached_file(
                           ^^^^^^^^^^^^
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/transformers/utils/hub.py", line 354, in cached_file
    raise EnvironmentError(
OSError: ./mistral-finetuned-gpu does not appear to have a file named config.json. 

I also tried renaming the files to fix this, but then I got:

Traceback (most recent call last):
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-1e1e14080064>", line 1, in <module>
    model = FastLanguageModel.from_pretrained(model_directory)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/loader.py", line 79, in from_pretrained
    return dispatch_model.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/mistral.py", line 301, in from_pretrained
    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3412, in from_pretrained
    with safe_open(resolved_archive_file, framework="pt") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooSmall

Many thanks for any help, and this amazing lib!

mathewpan2 commented 7 months ago

The reason you're getting the error is because when you save your model with

model.save_pretrained("mistral-finetuned-gpu")

You're actually saving the peft model only, and not a complete model. You're doing fine-tuning with lora since you call

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj", ],
    lora_alpha=16,
    lora_dropout=0,  # Supports any, but = 0 is optimized
    bias="none",  # Supports any, but = "none" is optimized
    use_gradient_checkpointing=True,
    random_state=3407,
    max_seq_length=max_seq_length,
)

To actually use the your model after training, you'll have to merge your lora weights back with the original model you trained on.

I reccomend checking this guide on huggingspace for more information on how to do that.

danielhanchen commented 7 months ago

@asmith26 @mathewpan2 Actually this might look like a bug - i'll get back to you all! Sorry!

danielhanchen commented 7 months ago

Oh wait @asmith26 could you try upgrading Unsloth pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

The above will not install any new dependencies as well.

It's because I tried it in Colab and it seems fine - also I noticed the error is

File "~/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/loader.py", line 68, in from_pretrained
    model_config = AutoConfig.from_pretrained(model_name)

which I'm guessing is an older version of Unsloth - AutoConfig is now on line 83 and not 68!

asmith26 commented 7 months ago

Thanks very for your help @danielhanchen. Upgrading and using the info from @mathewpan2 (also thanks!) I think I've got this to work:

import torch
from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("unsloth/mistral-7b-bnb-4bit")
model = PeftModel.from_pretrained(model, "./mistral-finetuned-gpu")
tokenizer = AutoTokenizer.from_pretrained("unsloth/mistral-7b-bnb-4bit")

inputs = tokenizer.encode("This movie was really great!", return_tensors="pt").to("cuda")
with torch.no_grad():
    logits = model(input_ids=inputs).logits
predicted_class_id = logits.argmax().item()
print(model.config.id2label[predicted_class_id])

I also tried using unsloth directly, but I can't seem to get it to work (not sure if I need to tell unsloth this is a SequenceClassification task somehow?):

import torch
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained("./mistral-finetuned-gpu")
FastLanguageModel.for_inference(model)  # Enable native 2x faster inference

inputs = tokenizer.encode("This movie was really great!", return_tensors="pt").to("cuda")
with torch.no_grad():
    logits = model(input_ids=inputs).logits
predicted_class_id = logits.argmax().item()
print(model.config.id2label[predicted_class_id])
danielhanchen commented 7 months ago

@asmith26 Oh I did not see this - apologies - I fixed the first bug you described. On the 2nd issue - ye sadly we don't provide a function to load up AutoModelForSequenceClassification :( Sorry :(

asmith26 commented 7 months ago

No problem, thanks for the info - I'm happy to train with unsloth and infer directly with huggingface. So please feel free to close this issue if helpful :)

danielhanchen commented 7 months ago

@asmith26 Oh its fine - it'll be a feature request :)