Closed nathan-az closed 5 months ago
I think this should be as simple as checking the tokenizer's template
(or default_chat_template
) and seeing if system
appears in it, before injecting the empty system message.
It sounds simple but I can't think of any cases in which it would fail. Can make a PR in the next few days.
Resolved by above
Noticed this while trying to run SFT on
mistralai/Mistral-7B-Instruct-v0.2
. It seems some models simply don't support a system prompt. Error was the following:TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
.My dataset does not include a system prompt. The behaviour is likely due to the empty system prompt being injected in
apply_chat_template
.Happy to make a PR to change this behaviour and not inject an empty system prompt, but not sure if we want to. The reverse may be true for some tokenizers, where a system prompt is absolutely required. Maybe a new argument
inject_system_prompt
orrequires_system_prompt
? There may also be a way to inspect the jinja templates to determine, but I haven't looked closely at this.