Always allow `ref_model=None`

qgallouedec commented 2 months ago

Feature request

For optimisation with reference model, in most cases the reference model is the same as the trained model. We should allow the user to specify the ref model only when they don't want to use the trained model.

Currently this is possible, but only when using PEFT, which is very counter-intuitive. And even using this situation, if you want to provide a ref model that is different from the trained model, you have to define force_use_model. Even more counter-intuitive.

Currently

model = ref_model and no peft

DPOTrainer(model=model, ref_model= ref_model)  # where ref_model should be another instance

model = ref_model and peft

DPOTrainer(model=model)

model != ref_model and no peft

DPOTrainer(model=model, ref_model=ref_model)

model != ref_model and peft

args = DPOConfig(force_use_ref_model=True)
DPOTrainer(model=model, ref_model=ref_model, args=args)

Proposed

model = ref_model

DPOTrainer(model=model)

model != ref_model

DPOTrainer(model=model, ref_model=ref_model)

Motivation

Make the lib use more intuitive.

Your contribution

For sure ;)

RylanSchaeffer commented 2 months ago

Quoting from the other issue:

However, handling ref_model/model is pretty tricky currently, maybe wait until https://github.com/huggingface/trl/issues/2047 is solved?

Is there an explanation for why ref_model and model are tricky? If I was to work on this, should I be wary of any challenges that might pop up?

qgallouedec commented 2 months ago

I believe that this may be due to the implementation being carried out in multiple stages: first the initial version, followed by PEFT support, then integration with DeepSpeed... It's probably a good time to re-think it as a whole. However, we must be careful not to introduce any regressions or breaking changes : we must test all the parameter combinations.

RylanSchaeffer commented 2 months ago

In that case, I think it makes sense to just fix the other issue first because the fix for that issue is an equality check, right?

qgallouedec commented 2 months ago

In that case, I think it makes sense to just fix the other issue first because the fix for that issue is an equality check, right?

Perhaps you should give it a try. It's difficult to assess the changes involved.

qgallouedec commented 1 month ago

Implemented for Online DPO in #2041. It can probably be taken as reference

huggingface / trl