Open asphytheghoul opened 5 months ago
@asphytheghoul Whoops in llm_int8_skip_modules
- in ur config file config.json
change llm_int8_skip_modules = "null"
to llm_int8_skip_modules = null
with no speech marks - I just fixed it on my side - sorry!
In terms of extending the tokenizer - you also need to update the lm_head
and embedding
matrix for eg with:
def smart_tokenizer_and_embedding_resize(
special_tokens_dict: Dict,
tokenizer: transformers.PreTrainedTokenizer,
model: transformers.PreTrainedModel,
):
"""Resize tokenizer and embedding.
Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
"""
num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
model.resize_token_embeddings(len(tokenizer))
if num_new_tokens > 0:
input_embeddings_data = model.get_input_embeddings().weight.data
output_embeddings_data = model.get_output_embeddings().weight.data
input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean(dim=0, keepdim=True)
output_embeddings_avg = output_embeddings_data[:-num_new_tokens].mean(dim=0, keepdim=True)
input_embeddings_data[-num_new_tokens:] = input_embeddings_avg
output_embeddings_data[-num_new_tokens:] = output_embeddings_avg
pass
@danielhanchen Thank you for the quick response! Should this function be called on the model and tokenizer before patching it with LORA adapters or after ? i.e. like this :
import transformers
def smart_tokenizer_and_embedding_resize(
tokenizer: transformers.PreTrainedTokenizer,
model: transformers.PreTrainedModel,
):
"""Resize tokenizer and embedding.
Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
"""
model.resize_token_embeddings(len(tokenizer))
num_new_tokens = 15937
if num_new_tokens > 0:
input_embeddings_data = model.get_input_embeddings().weight.data
output_embeddings_data = model.get_output_embeddings().weight.data
input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean(dim=0, keepdim=True)
output_embeddings_avg = output_embeddings_data[:-num_new_tokens].mean(dim=0, keepdim=True)
input_embeddings_data[-num_new_tokens:] = input_embeddings_avg
output_embeddings_data[-num_new_tokens:] = output_embeddings_avg
print("Done!")
smart_tokenizer_and_embedding_resize(tokenizer,model)
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
use_gradient_checkpointing = True,
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
or like this :
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
use_gradient_checkpointing = True,
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
def smart_tokenizer_and_embedding_resize(
tokenizer: transformers.PreTrainedTokenizer,
model: transformers.PreTrainedModel,
):
"""Resize tokenizer and embedding.
Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
"""
model.resize_token_embeddings(len(tokenizer))
num_new_tokens = 15937
if num_new_tokens > 0:
input_embeddings_data = model.get_input_embeddings().weight.data
output_embeddings_data = model.get_output_embeddings().weight.data
input_embeddings_avg = input_embeddings_data[:-num_new_tokens].mean(dim=0, keepdim=True)
output_embeddings_avg = output_embeddings_data[:-num_new_tokens].mean(dim=0, keepdim=True)
input_embeddings_data[-num_new_tokens:] = input_embeddings_avg
output_embeddings_data[-num_new_tokens:] = output_embeddings_avg
print("Done!")
smart_tokenizer_and_embedding_resize(tokenizer,model)
Thanks
The first one should be correct ie:
model, tokenizer = FastLanguageModel.from_pretrained(...)
edit_tokenizer(tokenizer)
smart_tokenizer_and_embedding_resize(tokenizer, model)
model = FastLanguageModel.get_peft_model(...)
Hello @danielhanchen , I tried your suggestion and unfortunately I still get errors but I have understood the problem. When i save the trained adapters using
model.save_pretrained("name_of_model")
tokenizer.save_pretrained("name_of_model")
and try to load them again using :
from unsloth import FastLanguageModel
model,tokenizer = FastLanguageModel.from_pretrained(
model_name = "./name_of_model",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
the error stems due to the fact that unsloth is looking at the adapter_config.json file and at the base_model_name_or_path
key. The value of this is unsloth/llama-2-7b-bnb-4bit
. So it is trying to apply the adapters onto the llama-2 model which has an embedding size of (32000,4096) . That's the main cause of the error :
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([47943, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([47943, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
How do you suggest i proceed? Thanks
@asphytheghoul If you are primarily using it for inference, I suggest for now to use HF's general loading mechanisms - for now I don't think I can support expanded vocabs via FastLanguageModel
- I'll add a fix maybe in the next few days - but for now a quick fix is to use general HF loading. Sorry the issue is there though!
@danielhanchen Hello! I have found a solution to this problem. If anyone is facing issues, this is an expected situation that will occur and is not an issue with unsloth in any way. The reason this happens is because you are loading the base model (example : llama-2) , resizing the token embeddings and proceeding to fine-tune the model on your data. once you have finished fine-tuning, you would proceed to save the adapters. Works well so far because you trained the LoRA adapters with resized embeddings with your extended vocabulary tokenizer. Now the problem happens when you try to load it again because the adapters were trained and saved based off the LLaMA-2 model
configuration, so if you inspect the adapter_config.json
file, you will find the base_model_or_path
key holding the value of the base model you used while fine-tuning. (In this case, meta-llama/Llama-2-7b-hf
). So it looks at the configuration file of the llama-2 model and completely ignores the fact that you had resized the embeddings and tries to load the adapters you trained on this model which results in the following error :
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([47943, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([47943, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
The solution to this problem is the following :
push_to_hub_merged()
method where the save_method='merged_16bit'
is specified, so you are saving your trained model to a 16-bit vLLM. For example :
model.push_to_hub_merged("NAME_OF_MODEL", tokenizer, save_method = "merged_16bit", token = token)
note: this might not be the only solution but it's a workaround I explored with and found to work for my case.
Thanks!
@asphytheghoul Oh yep great point / solution on merging the model to 16bit :) Not sure why I didn't mention that whoops :) But super glad you got it to work in the end!
Hello so i was fine-tuning a llama-2 model with unsloth using a tokenizer of my own, it has an extended vocabulary of around 48000 tokens in total, the tokenizer is compatible and checks have been made from my end to ensure the same. This is the code i have implemented using the colab notebook you have provided and I am unable to load my adapters after fine-tuning :
model.save_pretrained("translation-en-hin-no-merges") # Local saving
.But when i load it using :
if True: model.save_pretrained_merged("model_16bit", tokenizer, save_method = "merged_16bit",)
i get this error :
ValueError: llm_int8_skip_modules must be a list of strings
Please do help out :)