huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.53k stars 393 forks source link

Unexpected behavior in apply_chat_template function adding repeated assistant turns #171

Closed iseesaw closed 4 months ago

iseesaw commented 4 months ago

Description

In the apply_chat_template function used for DPO training, there appears to be an issue where generation_prompt is added even when add_generation_prompt is not set to True. This results in repeated assistant turns in the Llama template, potentially affecting the training outcomes.

Steps to Reproduce

  1. Apply the apply_chat_template function as follows:
    example["text_chosen"] = tokenizer.apply_chat_template(chosen_messages, tokenize=False)
    example["text_rejected"] = tokenizer.apply_chat_template(rejected_messages, tokenize=False)
    example["text_prompt"] = tokenizer.apply_chat_template(prompt_messages, tokenize=False)
  2. Review the outputs in different parts of the dataset.

Expected Behavior

The function should not add generation_prompt to the outputs unless explicitly set by add_generation_prompt=True.

Observed Behavior

The outputs include repeated assistant turns in the Llama template, as shown in the examples below:

Prompt sample 14592 of the raw training set:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

xxxxxx<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Chosen sample 14592 of the raw training set:
<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

xxxxxx<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Rejected sample 14592 of the raw training set:
<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

xxxx<|eot_id|><|start_header_id|>assistant<|end_header_id|>

This repetition of the assistant's turn <|start_header_id|>assistant<|end_header_id|> appears irrespective of the setting of add_generation_prompt.

Additional Information

Please investigate this issue as it might be influencing the training process negatively. Any guidance on the expected outputs and how to correctly use the apply_chat_template would also be appreciated.

iseesaw commented 4 months ago

Solved. Because I used the initial chat template, which was later updated.

See Fix chat template to add generation prompt only if the option is selected (#9)