huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.28k stars 367 forks source link

Chat template is not loaded when evaluating on MT-bench #70

Closed ChenDRAG closed 7 months ago

ChenDRAG commented 7 months ago

It was stated in the readme that

For both benchmarks, we have added support for the Zephyr chat template (which is the default produced by our scripts), so you can evaluate models produced by our scripts as follows:

Then the document says

Make sure the word zephyr exists in the --model-path argument when generating the model responses here. This will ensure the correct chat template is loaded.

However, I find that this is not true regarding the latest code provided by fastchat/llm_judge.

The provided template for tokenizer_config.json looks like:

'<|system|>\n \n <|user|>\n Please provide the content of conditions.....\n <|assistant|>\n '

However, when evaluating zephyr (ensuring zephyr appears in model path), that chat template is

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. ### Human: Please provide the content of conditions..... ### Assistant:

My question is, which template did the technical report of zephyr uses when reporting 7.34 score on MT-bench dataset? Should I rewrite the code hosted by fastchat/llm_judge so that I can use chat template provided here.

ps my command is

python3 -u gen_model_answer.py --model-path /home/huayu/git/alignment-handbook/data/zephyr-7b-dpo-lora_bz8_8_1_lr_4_e3_logfix/zephyr_checkpoint-969 --model-id zephyr-7b-dpo-lora_bz8_8_1_lr_4_e3_logfix_E1 --num-gpus-total 1

ChenDRAG commented 7 months ago

I got stupid