Open AbnetS opened 2 weeks ago
Are you using the unsloth add_new_token
function?
from unsloth import add_new_tokens
add_new_tokens(model, tokenizer, new_tokens = ["<SPECIAL_TOKEN_1>", "<SPECIAL_TOKEN_2>")
I tried this and it works + I can use trainer.train(resume_from_checkpoint = True)
too.
Don't forget to use this before FastLanguageModel.get_peft_model
.-.
I might reproduced your problem, please let me know if this is the bug or not .-.
So what I did here, is after the checkpoint is saved. I load the model but I didn't run the add_new_tokens
again. It might be the problemm? .-.
Thanks @Erland366 for the replies and the suggestions.
The error is exactly that. To answer your questions and explain what I was trying to do, I listed below some code snipplets:
I separately trained a tokenizer ("am1_tokenizer") for my local language with SentencePiece and to merge it to the Llama3.2 tokenizer ("tokenizer"), I used the following:
for p in tqdm(am1_tokenizer.pieces):
tokenizer.add_tokens(AddedToken(am1.decode(p.piece), normalized=False,special=False))
tokenizer.save_pretrained("amh_custom_tokenizer")
I run continued pretraining using text dataset for language adaptation:
from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast.from_pretrained("amh_custom_tokenizer") #_3
model,_= FastLanguageModel.from_pretrained(
model_name="unsloth/Llama-3.2-3B-bnb-4bit",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
model.resize_token_embeddings(len(tokenizer))
model = FastLanguageModel.get_peft_model(
model,
r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
"embed_tokens", "lm_head",], # Add for continual pretraining
lora_alpha = 32,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = True, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
) from trl import SFTTrainer from transformers import TrainingArguments from unsloth import is_bfloat16_supported from unsloth import UnslothTrainer, UnslothTrainingArguments
trainer = UnslothTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset['train'], dataset_text_field = "text", max_seq_length = max_seq_length, dataset_num_proc = 8,
args = UnslothTrainingArguments(
per_device_train_batch_size = 4,#2 #4
gradient_accumulation_steps = 16, #8 #16
#max_steps = 2000,
warmup_ratio = 0.1,
num_train_epochs = 1,
learning_rate = 5e-5,
embedding_learning_rate = 5e-6,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.00,
lr_scheduler_type = "cosine",
seed = 3407,
output_dir = "models/llama3.2_amh_19m_3",
save_strategy = "steps",
save_steps = 5000,
),
) from unsloth import unsloth_train trainer_stats = trainer.train()
3. So far, so good. The training went on without problem, and saving the checkpoints every 5000 steps automatically. And even if the training is interrupted, resume_from_checkpoint = True works well like you said.
4. But what I want is while the continued pretraining is running, I want to finetune the saved checkpoints further with instructional dataset (another dataset) for a downstream task, so I tried to load one of them in another notebook as follows:
```python
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # 2048 Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
model = "/path/to/checkpoint-5000"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
Unsloth: Tokenizer is most likely buggy, and Unsloth failed to repair it.
It will still work, but beware of out of bounds memory accesses.
Please file an issue on the model owner's repo about this issue.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[2], line 9
5 load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
7 model = "/home/abnets/unsloth/models/llama3.2_amh_19m_2/checkpoint-1000"
----> 9 model, tokenizer = FastLanguageModel.from_pretrained(
10 model_name = model,
11 max_seq_length = max_seq_length,
12 dtype = dtype,
13 load_in_4bit = load_in_4bit,
14 # resize_model_vocab = 146452
15 # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
16 )
File ~/unsloth/unsloth_env/lib/python3.10/site-packages/unsloth/models/loader.py:383, in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, *args, **kwargs)
379 if is_peft:
380 # From https://github.com/huggingface/peft/issues/184
381 # Now add PEFT adapters
382 model.enable_input_require_grads()
--> 383 model = PeftModel.from_pretrained(
384 model,
385 old_model_name,
386 token = token,
387 revision = revision,
388 is_trainable = True,
389 trust_remote_code = trust_remote_code,
390 )
391 # Patch it as well!
392 model = dispatch_model.patch_peft_model(model, use_gradient_checkpointing)
File ~/unsloth/unsloth_env/lib/python3.10/site-packages/peft/peft_model.py:586, in PeftModel.from_pretrained(cls, model, model_id, adapter_name, is_trainable, config, autocast_adapter_dtype, ephemeral_gpu_offload, low_cpu_mem_usage, **kwargs)
577 else:
578 model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](
579 model,
580 config,
(...)
583 low_cpu_mem_usage=low_cpu_mem_usage,
584 )
--> 586 model.load_adapter(
587 model_id,
588 adapter_name,
589 is_trainable=is_trainable,
590 autocast_adapter_dtype=autocast_adapter_dtype,
591 low_cpu_mem_usage=low_cpu_mem_usage,
592 **kwargs,
593 )
595 return model
File ~/unsloth/unsloth_env/lib/python3.10/site-packages/peft/peft_model.py:1181, in PeftModel.load_adapter(self, model_id, adapter_name, is_trainable, torch_device, autocast_adapter_dtype, ephemeral_gpu_offload, low_cpu_mem_usage, **kwargs)
1179 # load the weights into the model
1180 ignore_mismatched_sizes = kwargs.get("ignore_mismatched_sizes", False)
-> 1181 load_result = set_peft_model_state_dict(
1182 self,
1183 adapters_weights,
1184 adapter_name=adapter_name,
1185 ignore_mismatched_sizes=ignore_mismatched_sizes,
1186 low_cpu_mem_usage=low_cpu_mem_usage,
1187 )
1188 if (
1189 (getattr(self, "hf_device_map", None) is not None)
1190 and (len(set(self.hf_device_map.values()).intersection({"cpu", "disk"})) > 0)
1191 and len(self.peft_config) == 1
1192 ):
1193 device_map = kwargs.get("device_map", "auto")
File ~/unsloth/unsloth_env/lib/python3.10/site-packages/peft/utils/save_and_load.py:464, in set_peft_model_state_dict(model, peft_model_state_dict, adapter_name, ignore_mismatched_sizes, low_cpu_mem_usage)
462 module._move_adapter_to_device_of_base_layer(adapter_name)
463 else:
--> 464 load_result = model.load_state_dict(peft_model_state_dict, strict=False)
466 if config.is_prompt_learning:
467 model.prompt_encoder[adapter_name].embedding.load_state_dict(
468 {"weight": peft_model_state_dict["prompt_embeddings"]}, strict=True
469 )
File ~/unsloth/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py:2584, in Module.load_state_dict(self, state_dict, strict, assign)
2576 error_msgs.insert(
2577 0,
2578 "Missing key(s) in state_dict: {}. ".format(
2579 ", ".join(f'"{k}"' for k in missing_keys)
2580 ),
2581 )
2583 if len(error_msgs) > 0:
-> 2584 raise RuntimeError(
2585 "Error(s) in loading state_dict for {}:\n\t{}".format(
2586 self.__class__.__name__, "\n\t".join(error_msgs)
2587 )
2588 )
2589 return _IncompatibleKeys(missing_keys, unexpected_keys)
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([146452, 3072]) from checkpoint, the shape in current model is torch.Size([128256, 3072]).
size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([146452, 3072]) from checkpoint, the shape in current model is torch.Size([128256, 3072]).
FastLanguageModel.from_pretrained
. If the checkpoints were merged with the base model before they were automatically saved by the trainer, this problem wouldn't have happened. The merging solves the problem as indicated in https://github.com/unslothai/unsloth/issues/154I hope that is clear. Please let me know if I am doing something wrong, or if there is a way to automatically save the merged checkpoint.
Oh yeah, I think I can implement it
As discussd in issue https://github.com/unslothai/unsloth/issues/154#issue-2119969174 , I am also working with extended tokenizer to accomodate words of a new language. I've merged Llama 3.2 tokenizer with my tokenizer and the size was increased to 146,452 (as opposed to 128,256, which is the size of the original Llama3.2 tokenizer). I am running a continual pretraining, and saving checkpoints at a certain number of steps. I want to finetune the checkpoints further with instructional dataset to track their performances. However, I am not able to load the checkpoints due to the mismatch in tokenizer size of the base model and the adapter. I read about the suggested solution: to merge and save the checkpoints. However, since unlsoth is automatically saving the checkpoints, I don't have the chance to do that without first loading the models. So, what should I do? Any suggestion is appreciated!