unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.4k stars 1.29k forks source link

Is it possible to use `train_on_responses_only` with the Mistral template? #1229

Open kldzj opened 3 weeks ago

kldzj commented 3 weeks ago

train_on_responses_only expects instruction_part and response_part, which seems to not work with the Mistral chat template.

Whenever I try some kind of [INST] combination, the spaced decode is always just a new-line for me. Perhaps I'm doing it wrong.

Is it possible to train on responses only with the Mistral chat template? If so, could you kindly provide a example? :)

dendarrion commented 2 weeks ago

Try using the [/INST] as the response part, that is what I am currently doing. Although I am not sure if this would work for multi-turn chats.

trainer = train_on_responses_only(
    base_trainer,
    instruction_part='[INST] ',
    response_part='[/INST] '
)
tokenizer.decode(trainer.train_dataset[0]["input_ids"])
[INST] Very long content… [/INST] Assistant answer…'
space = tokenizer(' ', add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[0]["labels"]])
'                        Assistant answer…'
kldzj commented 2 weeks ago

Thanks a lot for your input! :)

I'll give it another shot, but I think I tried this combination before. It's crucial that it works with a multi-turn dataset.