huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.28k stars 367 forks source link

Why zephyr-7b-dpo-lora is finetuned from mistralai/Mistral-7B-v0.1 instead of zepher-7b-sft model? #39

Open ChenDRAG opened 8 months ago

ChenDRAG commented 8 months ago

There is a misalignment between zephyr-7b-dpo-lora and zephyr-7b-dpo-full. The former one is finetuned from mistralai/Mistral-7B-v0.1. The latter is finetuned from zephyr-7b-dpo-full.

I wonder what causes this misalignment ?

Also, have you benchmarked performance improvement of the lora finetunning script? In my experiment, lora finetunning seems do not provide any performance improvement compared with the base model on MT-bench. I think maybe some parameters are incorrect.

JiuhaiChen commented 8 months ago

I found the same issue here

edbeeching commented 8 months ago

In general, we observe better performance with the full finetune. Although we did not perform a full hyperparameter scan on the lora configs so I am sure improvements can be made there.

As for the misalignment, I am not sure what you are referring to. The dpo-lora config fine-tunes on top of the sft-lora model. Can you provide some more detail?