huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.2k stars 357 forks source link

Zephyr-dpo-full Checkpoints perform poorly on TruthfulQA. #122

Open xijiu9 opened 4 months ago

xijiu9 commented 4 months ago

Hello, I observe that the models I trained and the official models provided by HuggingFace do not match the results of Zephyr-7b-beta on TruthfulQA.

I used lm_evaluate_harness for evaluation, and the metric used was mc2.

The result for HuggingFaceH4/zephyr-7b-beta is 55.15, and the result for mistralai/Mistral-7B-7b-beta is 42.59. Both of these numbers are correct.

However, the result for alignment-handbook/zephyr-7b-dpo-full is only 45.07, and the results for alignment-handbook/zephyr-7b-sft-full is only 40.38.

Furthermore, the results from my own trained checkpoint are also incorrect. The result for sft-full is 40.12, and the result for dpo-full is 47.40.

Version of lm-evaluate-harness is this

xijiu9 commented 4 months ago

I further do evaluation on some other datasets: the alignment-handbook/zephyr-7b-dpo-full model still performs worse than HuggingFaceH4/zephyr-7b-beta. image