unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.57k stars 1.04k forks source link

issue with Merging lora adapters #935

Closed Ammar-Alnagar closed 3 weeks ago

Ammar-Alnagar commented 3 weeks ago

i finetuned a lora adapter for mistral nemo , i am trying to merge it on the base model and push it to my hub then quantize it (to overcome the disk space issues) my setup --> `from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained( model_name = "mistralai/Mistral-Nemo-Instruct-2407", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit,

token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf

)

from peft import PeftModel

adapter_path = "adapter_name" model = PeftModel.from_pretrained(model, adapter_path) model = model.merge_and_unload()

model.push_to_hub("after adapter ", token = "") tokenizer.push_to_hub("after adapter ", token = "") `

the error --> `KeyError Traceback (most recent call last) in <cell line: 19>() 17 18 adapter_path = "adapter name " ---> 19 model = PeftModel.from_pretrained(model, adapter_path) 20 model = model.merge_and_unload() 21

2 frames /usr/local/lib/python3.10/dist-packages/peft/peft_model.py in _update_offload(self, offload_index, adapters_weights) 1026 suffix_pos = safe_key.rfind(".") 1027 extended_prefix = prefix + block_id + safe_key[:suffix_pos] -> 1028 safe_module = dict(self.named_modules())[extended_prefix] 1029 if isinstance(safe_module, BaseTunerLayer): 1030 final_key = extended_prefix + ".base_layer" + safe_key[suffix_pos:]

KeyError: 'base_model.model.model.model.layers.24.input_layernorm'`

i am using the free colab notebook if that makes any difference

danielhanchen commented 3 weeks ago

Oh try not using model = model.merge_and_unload(), and instead use model.push_to_hub_merged(...)