Unexpected behavior in apply_chat_template function adding repeated assistant turns

Description

In the apply_chat_template function used for DPO training, there appears to be an issue where generation_prompt is added even when add_generation_prompt is not set to True. This results in repeated assistant turns in the Llama template, potentially affecting the training outcomes.

Steps to Reproduce

Apply the apply_chat_template function as follows:

example["text_chosen"] = tokenizer.apply_chat_template(chosen_messages, tokenize=False)
example["text_rejected"] = tokenizer.apply_chat_template(rejected_messages, tokenize=False)
example["text_prompt"] = tokenizer.apply_chat_template(prompt_messages, tokenize=False)

Review the outputs in different parts of the dataset.

Expected Behavior

The function should not add generation_prompt to the outputs unless explicitly set by add_generation_prompt=True.

Observed Behavior

The outputs include repeated assistant turns in the Llama template, as shown in the examples below:

Prompt sample 14592 of the raw training set:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

xxxxxx<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Chosen sample 14592 of the raw training set:
<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

xxxxxx<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Rejected sample 14592 of the raw training set:
<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

xxxx<|eot_id|><|start_header_id|>assistant<|end_header_id|>

This repetition of the assistant's turn <|start_header_id|>assistant<|end_header_id|> appears irrespective of the setting of add_generation_prompt.

Additional Information

It is unclear whether this behavior affects the training outcomes negatively, but it certainly alters the intended structure of the training data.
No custom modifications were made to the function; the issue persists with the default implementation.

Please investigate this issue as it might be influencing the training process negatively. Any guidance on the expected outputs and how to correctly use the apply_chat_template would also be appreciated.

huggingface / alignment-handbook