huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.28k stars 367 forks source link

DPO alignment doesn't work on Lora models as suggested #68

Open Abe13 opened 7 months ago

Abe13 commented 7 months ago

You claim that "In practice, we find comparable performance for both full and LoRA fine-tuning, with the latter having the advantage of producing small adapter weights that are fast to upload and download from the Hugging Face Hub."

However, when I try the Lora model DPO-aligned LLM that you have trained, alignment-handbook/zephyr-7b-dpo-lora, I experience a total performance degradation. Here is an example of model output that seems confused: image

Even the training loss indicates that the model has not learned much

image

Here is the training loss for the full model DPO alignment. image

Would you please do a clarification? Is my observation different from what you have experienced?

Thanks

lewtun commented 7 months ago

Hello @Abe13 thanks for raising this issue! Yes, there's seems to be a discrepancy / regression that occurred during the porting of our internal codebase and we're currently working on tracking it down. See this issue for related discussion: https://github.com/huggingface/alignment-handbook/issues/45