Chat template is not loaded when evaluating on MT-bench

It was stated in the readme that

For both benchmarks, we have added support for the Zephyr chat template (which is the default produced by our scripts), so you can evaluate models produced by our scripts as follows:

Then the document says

Make sure the word zephyr exists in the --model-path argument when generating the model responses here. This will ensure the correct chat template is loaded.

However, I find that this is not true regarding the latest code provided by fastchat/llm_judge.

The provided template for tokenizer_config.json looks like:

'<|system|>\n \n <|user|>\n Please provide the content of conditions.....\n <|assistant|>\n '

However, when evaluating zephyr (ensuring zephyr appears in model path), that chat template is

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. ### Human: Please provide the content of conditions..... ### Assistant:

My question is, which template did the technical report of zephyr uses when reporting 7.34 score on MT-bench dataset? Should I rewrite the code hosted by fastchat/llm_judge so that I can use chat template provided here.

ps my command is

python3 -u gen_model_answer.py --model-path /home/huayu/git/alignment-handbook/data/zephyr-7b-dpo-lora_bz8_8_1_lr_4_e3_logfix/zephyr_checkpoint-969 --model-id zephyr-7b-dpo-lora_bz8_8_1_lr_4_e3_logfix_E1 --num-gpus-total 1

huggingface / alignment-handbook

Chat template is not loaded when evaluating on MT-bench #70