DPO alignment doesn't work on Lora models as suggested

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

Apache License 2.0

4.28k stars 367 forks source link

You claim that "In practice, we find comparable performance for both full and LoRA fine-tuning, with the latter having the advantage of producing small adapter weights that are fast to upload and download from the Hugging Face Hub."

However, when I try the Lora model DPO-aligned LLM that you have trained, alignment-handbook/zephyr-7b-dpo-lora, I experience a total performance degradation. Here is an example of model output that seems confused:

Even the training loss indicates that the model has not learned much

Here is the training loss for the full model DPO alignment.

Would you please do a clarification? Is my observation different from what you have experienced?

Thanks

huggingface / alignment-handbook

DPO alignment doesn't work on Lora models as suggested #68