huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.54k stars 393 forks source link

jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/... #93

Open Feynman27 opened 9 months ago

Feynman27 commented 9 months ago

When running the DPO script, when calling

    #####################
    # Apply chat template
    #####################
    raw_datasets = raw_datasets.map(
        apply_chat_template,
        fn_kwargs={"tokenizer": tokenizer, "task": "dpo"},
        num_proc=data_args.preprocessing_num_workers,
        remove_columns=column_names,
        desc="Formatting comparisons with prompt template",
    )

I'm getting the error:

jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
Feynman27 commented 8 months ago

I think this has something to do with the tokenizer. I trained an SFT model, and am providing the local path to that model for DPO. If I use the default path from the hub alignment-handbook/zephyr-7b-sft-full, I don't get the error and DPO training starts fine.

Feynman27 commented 8 months ago

It appears the tokenizer_config.json written to the output model directory during the SFT stage needs to be replaced if loading the SFT model from that same local directory for the DPO phase. I switched out the tokenizer_config from the SFT phase to the one from the model card, and DPO training works now. It looks like all other configs are the same between SFT and DPO (e.g. tokenizer.json).

This was not obvious at all. Can we add a note to the README or make this more fool-proof?