Is it possible to use `train_on_responses_only` with the Mistral template?

kldzj commented 3 weeks ago

train_on_responses_only expects instruction_part and response_part, which seems to not work with the Mistral chat template.

Whenever I try some kind of [INST] combination, the spaced decode is always just a new-line for me. Perhaps I'm doing it wrong.

Is it possible to train on responses only with the Mistral chat template? If so, could you kindly provide a example? :)

dendarrion commented 2 weeks ago

Try using the [/INST] as the response part, that is what I am currently doing. Although I am not sure if this would work for multi-turn chats.

trainer = train_on_responses_only(
    base_trainer,
    instruction_part='[INST] ',
    response_part='[/INST] '
)

tokenizer.decode(trainer.train_dataset[0]["input_ids"])

[INST] Very long content… [/INST] Assistant answer…'

space = tokenizer(' ', add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[0]["labels"]])

'                        Assistant answer…'

kldzj commented 2 weeks ago

Thanks a lot for your input! :)

I'll give it another shot, but I think I tried this combination before. It's crucial that it works with a multi-turn dataset.

unslothai / unsloth

Is it possible to use `train_on_responses_only` with the Mistral template? #1229