huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.54k stars 393 forks source link

Does QLora DPO Training support reference model? #103

Open Harry-mic opened 8 months ago

Harry-mic commented 8 months ago

Hello! Thanks for your awesome work! I meet an issue when I run dpo with qlora. I notice there is a setting:

 if model_args.use_peft is True:
        ref_model = None
        ref_model_kwargs = None

I also notice that the use_peft is set to true only in config_qlora.yaml. This means if we use qlora to do dpo training, we do not use reference model at all.
I wonder if this code support qlora training with reference model? Thanks!