Load Unsloth-FT-Merged-Model with AutoModel Attribute Error #896

Open carstendraschner opened 1 month ago

carstendraschner commented 1 month ago

Hello :)

We used the default Unsloth Colab Pipeline to ft a LLAMA3.1 8B and replicated this as a notebook on an azure environment. https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing#scrollTo=QmUBVEnvCDJv The finetuning worked and is tested via inference using FastLanguageModel. We merged the model with 16bit and stored it locally to load run it with default AutoModel.

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    model_path, device_map="auto"
tokenizer = AutoTokenizer.from_pretrained(model_path, padding_side="left")
model_inputs = tokenizer(template.format(
            "Who is A. Dumbledore?",
        ), return_tensors="pt").to("cuda")
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

But we get: AttributeError: 'LlamaForCausalLM' object has no attribute 'max_seq_length' Could you help? Full stack trace

AttributeError                            Traceback (most recent call last)
Cell In[46], line 1
----> 1 generated_ids = model.generate(**model_inputs)
      2 tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

AttributeError: 'LlamaForCausalLM' object has no attribute 'max_seq_length'```
carstendraschner commented 1 month ago

This is also reproducible with the default Unsloth Colab Notebook: There is the step:

# I highly do NOT suggest - use Unsloth if possible
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
    load_in_4bit = load_in_4bit,
    low_cpu_mem_usage = True
tokenizer = AutoTokenizer.from_pretrained("lora_model")

I I add the code from above to generate output:

inputs = tokenizer(
        "What is a famous tall tower in Paris?", # instruction
        "", # input
        "", # output - leave this blank for generation!
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

I also get:

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What is a famous tall tower in Paris?

### Input:

### Response:
This might help you to reproduce despite the fact that this is based on the lora adapter already kindest regards

carstendraschner commented 1 month ago

I was also wondering why:

File ~/code/genai-ml/.venv/lib/python3.10/site-packages/unsloth/models/llama.py:864, in CausalLM_fast_forward.<locals>._CausalLM_fast_forward(self, input_ids, causal_mask, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, *args, **kwargs)

happens in the initial stack trace as it should not call unclothe, right?

carstendraschner commented 1 month ago

Interesting: When I use the stored files within an environment where unsloth is not installed, the model loading and inference works

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    awq_model_path, device_map="auto"

tokenizer = AutoTokenizer.from_pretrained(awq_model_path, padding_side="left")
model_inputs = tokenizer("""

Who is A. Dumbledore?<|eot_id|>
""", return_tensors="pt").to("cuda")
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=False)[0]
danielhanchen commented 1 month ago

Oh wait - when you load a model locally, do not call Unsloth or load Unsloth anywhere in your code! It'll patch over everything, causing issues

carstendraschner commented 1 month ago

Alright, thank you very much @danielhanchen for this feedback. this suits also my experience. maybe this needs to be clarified within the tutorial Colab notebooks where it appears to be compliant to use still AutoModel after importing and using Unsloth.

Overall thanks for the quick reply and for building Unsloth in general ;) Regards Carsten

danielhanchen commented 1 month ago

Good point on that!