unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.4k stars 1.29k forks source link

`train_on_responses_only` doesn't work for Mistral models #1262

Open XiaomoWu opened 2 weeks ago

XiaomoWu commented 2 weeks ago

I'm using a Mistral model and want to only train on responses. train_on_responses_only is supposed to only mask the user prompt, however, the following code masks both the user and assistant messages.

import torch
from datasets import Dataset
from trl import SFTTrainer
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template, train_on_responses_only

# init model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
)

# add LoRA adapter
model = FastLanguageModel.get_peft_model(
    model,
    r=8,  # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
)

# prepare data
message = [
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": "I am fine, thank you!"},
]
dataset = Dataset.from_dict({"chat": [message]})

# apply chat template
dataset = dataset.map(
    lambda x: {
        "prompt": tokenizer.apply_chat_template(
            x["chat"], tokenize=False, add_generation_prompt=False
        )
    },
    batched=True,
)

# init trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="prompt",
)

# only train on responses
trainer = train_on_responses_only(
    trainer,
    instruction_part = "[INST] ",
    response_part = "[/INST] ",
)

# test print
tokenizer.decode(trainer.train_dataset[0]["input_ids"])
space = tokenizer(' ', add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[0]["labels"]])

The code, particularly, the value of instruction_part and response_part, are from #1229. I tried different varieties of instruction_part and response_part, such as adding a tailing \n or space, but without success.

OS: Ubuntu 24.04 Pytorch: 2.5.0 + cu124 Unsloth: 2024.11.5

danielhanchen commented 2 weeks ago

Oh on chat templates since you did a PR to clean the system message up, would you be able to investigate this one @Erland366 ? Thanks :)

Erland366 commented 2 weeks ago

Will check them out!

kldzj commented 4 days ago

Hey @Erland366, did you have time to look into this? :)

It'd be great to get multi-turn chats to work with the Mistral template

Erland366 commented 3 days ago

@kldzj Sorry for the very late response

Actually Daniel already fix this in this discussion https://github.com/unslothai/unsloth/issues/1290#issuecomment-2478130636

Hopefully it works now .-.

selalipop commented 1 day ago

Can confirm the fix linked in that thread works, but not sure it's in the latest release

I had to install the unsloth-zoo nightly and restart my kernel