Open qgallouedec opened 2 months ago
Quoting from the other issue:
However, handling ref_model/model is pretty tricky currently, maybe wait until https://github.com/huggingface/trl/issues/2047 is solved?
Is there an explanation for why ref_model
and model
are tricky? If I was to work on this, should I be wary of any challenges that might pop up?
I believe that this may be due to the implementation being carried out in multiple stages: first the initial version, followed by PEFT support, then integration with DeepSpeed... It's probably a good time to re-think it as a whole. However, we must be careful not to introduce any regressions or breaking changes : we must test all the parameter combinations.
In that case, I think it makes sense to just fix the other issue first because the fix for that issue is an equality check, right?
In that case, I think it makes sense to just fix the other issue first because the fix for that issue is an equality check, right?
Perhaps you should give it a try. It's difficult to assess the changes involved.
Implemented for Online DPO in #2041. It can probably be taken as reference
Feature request
For optimisation with reference model, in most cases the reference model is the same as the trained model. We should allow the user to specify the ref model only when they don't want to use the trained model.
Currently this is possible, but only when using PEFT, which is very counter-intuitive. And even using this situation, if you want to provide a ref model that is different from the trained model, you have to define force_use_model. Even more counter-intuitive.
Currently
Proposed
Motivation
Make the lib use more intuitive.
Your contribution
For sure ;)