Open carstendraschner opened 3 months ago
This is also reproducible with the default Unsloth Colab Notebook: There is the step:
# I highly do NOT suggest - use Unsloth if possible
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
"lora_model", # YOUR MODEL YOU USED FOR TRAINING
load_in_4bit = load_in_4bit,
low_cpu_mem_usage = True
)
tokenizer = AutoTokenizer.from_pretrained("lora_model")
I I add the code from above to generate output:
inputs = tokenizer(
[
alpaca_prompt.format(
"What is a famous tall tower in Paris?", # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
I also get:
<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
What is a famous tall tower in Paris?
### Input:
### Response:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-16-6090b4ce7ab2>](https://fc110avl77-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240805-060208_RC00_659502231#) in <cell line: 12>()
10 from transformers import TextStreamer
11 text_streamer = TextStreamer(tokenizer)
---> 12 _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
9 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://fc110avl77-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240805-060208_RC00_659502231#) in __getattr__(self, name)
1707 if name in modules:
1708 return modules[name]
-> 1709 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
1710
1711 def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:
AttributeError: 'LlamaForCausalLM' object has no attribute 'max_seq_length'
This might help you to reproduce despite the fact that this is based on the lora adapter already kindest regards
I was also wondering why:
File ~/code/genai-ml/.venv/lib/python3.10/site-packages/unsloth/models/llama.py:864, in CausalLM_fast_forward.<locals>._CausalLM_fast_forward(self, input_ids, causal_mask, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, *args, **kwargs)
happens in the initial stack trace as it should not call unclothe, right?
Interesting: When I use the stored files within an environment where unsloth is not installed, the model loading and inference works
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
awq_model_path, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(awq_model_path, padding_side="left")
model_inputs = tokenizer("""
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
<|eot_id|><|start_header_id|>user<|end_header_id|>
Who is A. Dumbledore?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
""", return_tensors="pt").to("cuda")
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=False)[0]
Oh wait - when you load a model locally, do not call Unsloth or load Unsloth anywhere in your code! It'll patch over everything, causing issues
Alright, thank you very much @danielhanchen for this feedback. this suits also my experience. maybe this needs to be clarified within the tutorial Colab notebooks where it appears to be compliant to use still AutoModel after importing and using Unsloth.
Overall thanks for the quick reply and for building Unsloth in general ;) Regards Carsten
Good point on that!
Hello :)
We used the default Unsloth Colab Pipeline to ft a LLAMA3.1 8B and replicated this as a notebook on an azure environment. https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing#scrollTo=QmUBVEnvCDJv The finetuning worked and is tested via inference using FastLanguageModel. We merged the model with 16bit and stored it locally to load run it with default AutoModel.
But we get:
AttributeError: 'LlamaForCausalLM' object has no attribute 'max_seq_length'
Could you help? Full stack trace