Output from zephyr-7b-dpo-qlora is weird

It is said that zephyr-7b-dpo-qlora is finetuned from zephyr-7b-sft-qlora. However, in the adapter config file, the base model is set to mistralai/Mistral-7B-v0.1.

Also, I downloaded the model from https://huggingface.co/alignment-handbook/zephyr-7b-dpo-qlora, and tried to run the MT-bench score. The result is ~4.6 instead of 7+. The responses it generates are repetitive and erroneous. This may be because I used the wrong base model. Could you give me some instructions to test zephyr-7b-dpo-qlora?

p.s. I tried switching the base model to zephyr-7b-sft-qlora, but got the error below:

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /home/zephyr-7b-sft-qlora.

huggingface / alignment-handbook

Output from zephyr-7b-dpo-qlora is weird #99