Closed daegonYu closed 2 weeks ago
@daegonYu Yes that should work (I think) - The continued pretraining notebook does train on the same LoRA adapters twice - https://colab.research.google.com/drive/1tEd1FrOXWMnCU9UIvdYhs61tkxdMuKZu?usp=sharing so it should function (hopefully)
If I load the Lora model from outside and train it with UnslothTrainer without get_peft_model(), I can train it with the previously generated Lora parameters. Thank you for your answer.
Additionally, I have a question. When learning a decoder model, I understand that when the Instruction part is input to the model, only the Response part is learned by calculating the loss, but in the colab you suggested, it is entered as learning data without such distinction. Can the model learn effectively even if it is learned like this? Also, can you tell me about a blog or paper that includes an explanation of this?
@daegonYu You might be interested in our conversational notebook which masks out the instruction - https://colab.research.google.com/drive/1T5-zKWM_5OD21QHwXHiV9ixTRR7k3iB9?usp=sharing
Oh this is what I was looking for. thank you!
One thing I'm wondering about while researching this is, is it okay to assume that using DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer) and using DataCollatorForSeq2Seq(tokenizer = tokenizer) with train_on_responses_only( trainer,
response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n", ) will have the same effect?
Here's a more detailed example code.
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer, DataCollatorForCompletionOnlyLM
dataset = load_dataset("lucasmccabe-lmi/CodeAlpaca-20k", split="train")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
def formatting_prompts_func(example):
output_texts = []
for i in range(len(example['instruction'])):
text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
output_texts.append(text)
return output_texts
response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
trainer = SFTTrainer(
model,
train_dataset=dataset,
args=SFTConfig(output_dir="/tmp"),
formatting_func=formatting_prompts_func,
data_collator=collator,
)
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
# num_train_epochs = 1, # Set this for 1 full training run.
max_steps = 60,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
trainer,
#instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)
@daegonYu Sorry on the delay! Yes they're equivalent EXCEPT if you're doing more than 1 conversation. HF's one does not support it, whilst Unsloth does.
max_seq_length
May I ask why during the DPO training , when initialize model from the SFT model, as follows: model, tokenizer = FastLanguageModel.from_pretrained(
model_name = f"{args.ckpt_name}",
max_seq_length = max_seq_length,
max_seq_length = 4096? But in SFT trainer, this arg is 2048, what is the relation between max_seq_length and the args used in intilizing the DPO trainer, eg, max_length, max_prompt_length = prompt_length ;
@Candice1995 Apologies on the delay - DPO has a prompt, then 2 other fields - the accepted or rejected answer to the prompt. These fields have varying lengths, and so we have to truncate or specify the lengths for each. Unsloth's max_seq_length is the total maximum sum length of all the fields
Can I load a model trained by unsloth's CPT (continued Pre-Training) method, change only the saved Lora parameters to learnable parameters, and then proceed with CPT on a different data set? In other words, I want to continue the Lora parameters of the model trained by CPT on a different data set. Are there any reference documents or guidelines? If I run the code below to continue CPT on a different data set, won't the Lora layers be created overlapping? I want to use the Lora layers created in the previous CPT step as they are.