unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.37k stars 1.28k forks source link

The tokenizer does not have a {% if add_generation_prompt %} #1312

Open Galaxy-Husky opened 1 day ago

Galaxy-Husky commented 1 day ago

Hi,

After I upgraded unsloth from 2024.11.5 to 2024.11.7, it raised the error that the tokenizer does not have a {% if add_generation_prompt %}. The model is shenzhi-wang/Llama3.1-8B-Chinese-Chat. It has the following chat template:

{% set loop_messages = messages %}
{% for message in loop_messages %}
    {% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>

    '+ message['content'] | trim + '<|eot_id|>' %}
    {% if loop.index0 == 0 %}
        {% set content = bos_token + content %}
    {% endif %}
    {{ content }}
{% endfor %}
{{ '<|start_header_id|>assistant<|end_header_id|>

' }}

Could you check and fix it?

danielhanchen commented 7 hours ago

@Galaxy-Husky Ye so the chat template for that model looks incorrect - it should not have {{ '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }} by default, but rather{% if add_generation_prompt %}{ '<|start_header_id|>assistant<|end_header_id|>\n\n'`

Galaxy-Husky commented 5 hours ago

@Galaxy-Husky Ye so the chat template for that model looks incorrect - it should not have {{ '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }} by default, but rather{% if add_generation_prompt %}{ '<|start_header_id|>assistant<|end_header_id|>\n\n'`

Yes, I agree. But 2024.11.5 will help me add {% if add_generation_prompt %} to fix the template while 2024.11.7 will not and I can't use the model. Is this the expected behavior?