stanfordnlp / pyreft

ReFT: Representation Finetuning for Language Models
https://arxiv.org/abs/2404.03592
Apache License 2.0
947 stars 77 forks source link

[P1] Is it possible to merge the base model + REFT model into only model? #99

Closed celsowm closed 1 month ago

celsowm commented 1 month ago

Hi ! Is it possible to merge the base model + REFT model into only model?

Loading both models every time is not so good

Thanks in advance !

frankaging commented 1 month ago

@celsowm Thanks for your question! Currently, it is not possible to merge the "effect" of an intervention into the model weights, as we are intervening on a parameter-less stream - the residual stream.

However, I do want to point out that the "inability to merge" is actually a feature of ReFT. For instance, if you train a set of interventions on the same base LM, and each intervention is trained to adapt to a distinct domain, you can actually attach different interventions on the fly to steer model behaviors differently with a minimum switch cost.

I am closing this issue, but feel free to follow up if you have more questions!