Closed liwd190019 closed 4 months ago
Also can anyone suggest a good inference method to test the finetuned multi-turn chatbot? Currently, the inference code in the notebook for chatbot only supports 1-turn.
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
{"from": "human", "value": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)
I started to get this error today too, strange...
Try using the Instruct model ("unsloth/llama-3-8b-Instruct-bnb-4bit"
).
@sumukshashidhar @liwd190019 Apologies! As @dmitrii-palisaderesearch mentioned, please use the instruct version - using the base version will error out, since Unsloth does auto checking if some tokens are all 0s - if you still want to use the base model, either do not use the llama-3 chat template (just use Alpaca), or train on lm_head
and embed_tokens
@danielhanchen Do you know why it started to throw this error when it was successfully training with the base model a couple of months ago?
I added a check in Unsloth to check if your embeddings are untrained - I might have to change the logic actually
any update on this, am getting this error too for a base model?
@danielhanchen i am getting this error, i was following your notebook on instruct finetuning (am not using base model). my dataset contains latex symbols. is that so?
what do you mean by: "if your embeddings are untrained"?
my data is generated fromllama only.
Same here. I was finetuning LlaMA-3.1-7B w/ QLoRA normally, then I use Unsloth to continue the training with longer context.
There are 2 problems actually:
1. Can not resume using UnslothTrainer:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments
from unsloth import FastLanguageModel
max_seq_length = data_args.max_seq_length
dtype = torch_dtype
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = training_args.pretrain_qlora_path,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
unsloth_args = UnslothTrainingArguments(
per_device_train_batch_size = training_args.per_device_train_batch_size,
gradient_accumulation_steps = training_args.gradient_accumulation_steps,
max_steps = training_args.max_steps,
save_steps = training_args.save_steps,
logging_steps = training_args.logging_steps,
warmup_steps = training_args.warmup_steps,
save_total_limit= training_args.save_total_limit,
num_train_epochs = training_args.num_train_epochs,
learning_rate = training_args.learning_rate,
lr_scheduler_type = training_args.lr_scheduler_type,
weight_decay = training_args.weight_decay,
gradient_checkpointing = training_args.gradient_checkpointing,
embedding_learning_rate = training_args.embedding_learning_rate,
fp16 = training_args.fp16,
bf16 = training_args.bf16,
tf32 = True,
seed = 3407,
output_dir = training_args.output_dir,
dataloader_num_workers = training_args.dataloader_num_workers,
ddp_find_unused_parameters = training_args.ddp_find_unused_parameters,
overwrite_output_dir = training_args.overwrite_output_dir,
ignore_data_skip = training_args.ignore_data_skip, # Important to start Unsloth from datapoint 0
prediction_loss_only = training_args.prediction_loss_only,
evaluation_strategy = training_args.evaluation_strategy
)
trainer = UnslothTrainer(
model = model,
tokenizer = tokenizer,
args = unsloth_args,
train_dataset = train_dataset,
#dataset_text_field = "text",
max_seq_length = data_args.max_seq_length,
dataset_num_proc = 8,
packing = True, # Can make training 5x faster for short sequences.
optimizers=(optimizer_adamw, lr_scheduler_adamw),
)
resume_ckp = "./QLora_finetune_with_embed_lmhead/checkpoint-1200"
train_result = trainer.train(resume_from_checkpoint=resume_ckp )
--> ValueError: Unsloth: Untrained tokens of [[]] found, but embed_tokens & lm_head not trainable, causing NaNs. Restart then add embed_tokens
& lm_head
to FastLanguageModel.get_peft_model(target_modules = [..., "embed_tokens", "lm_head",]).
Are you using the base
model? Instead, use the instruct
version to silence this warning.
Any idea why ? I was doing OK a few days back I think ...!
2. Weird warning if resume by Transformers' Trainer instead of UnslothTrainer: Just a side note, while resuming Ok with Transformer's Trainer, I also got warning RNG file not found although the 2 files "rng_state_0.pth" and "rng_state_1.pth" were there in the checkpoint folder ! The resume will continue to train but God know if the quality is OK or not so we stopped. The entire resume also need to look at please @danielhanchen
Thanks, Steve
@milsun Apologies I saw you guys also commented on the other issue - hope it got partially resolved - sorry on the delay as well!
@paraschopra It's possible extra symbols might be breaking the tokenizer, but I'm unsure - these can cause NaN gradients, so I error out. Could you open a separate issue so I can look into it more - thanks!
@thusinh1969 Apologies on the delay as well - saw your other issue as well!
I want to implement llama 3 on multi-turn dialogue task, so I was trying to finetune it on one of my customed dataset, which is made by simply extracting all the dialogue contents from soda and reformat it to be in llama3's chat template.
I think the result of this dataset is quite similar to the original. But when I tried to train the model, I got the following error:
As it indicates, it's because there're some untrainable tokens. Though this bug can be fixed by following the hints, I just can't figure out why I introduced those untrainable tokens. After all, the two datasets (my customed one and the default one in the notebook) look very similar.
Here is a link to the colab, feel free to comment and give advice!