Integration of merge-kit into PEFT

ParagEkbote commented 2 weeks ago

Feature request

Integrate merge-kit functionalities within the PEFT library to enable users to leverage the techniques provided in the library.

This could include additional merging techniques beyond TIES and DARE which are currently natively supported by PEFT.

References: 1)https://github.com/arcee-ai/mergekit

2)https://huggingface.co/docs/peft/main/en/developer_guides/model_merging#model-merging

Motivation

For beginners, especially those new to fine-tuning large language models, integrating merge-kit requires familiarity with multiple merging methods and careful handling of model weights.

PEFT could bridge this gap by providing an easy-to-use, fully integrated solution for merging model weights.

Your contribution

With ample support and guidance, I could help in the integration.

BenjaminBossan commented 2 weeks ago

Thanks for opening this feature request and offering to help with the integration.

For my understanding, you are talking about the methods mentioned here? Do you mean that you would like to port over those methods to PEFT, similar to what we have for DARE and TIES, or is your suggestion to integrate the mergekit package itself?

ParagEkbote commented 2 weeks ago

@BenjaminBossan You are correct, I am indeed referring to the methods you have mentioned. Firstly, I think that porting over the entire mergekit package will be better in the long-term?

Secondly, if we can port individual methods like DARE and TIES, would that be simpler to implement and provide a better experience for merging models than merge-kit provides?

I think it depends on how swiftly we could add the merging methods in PEFT. Could you please let me know which of the two methods you have suggest is simpler to implement?

BenjaminBossan commented 2 weeks ago

I think it won't be easy to answer the question which of the two approaches is better. This would require a very good understanding of how mergekit is implemented, which I don't have.

To me, it looks like mergekit is more of a framework when looking at classes like this one than a utility package, even though there appear to be some standalone functions. A utility package would be easier to integrate than a framework. As such, I tend towards copying the methods into PEFT instead of relying on mergekit directly.

Before doing that, however, it would be important to figure out how beneficial the integration would be. Your main argument, as I understand it, would be to simplify the usage of mergekit functionality with PEFT. However, this could theoretically also be achieved by documentation and examples, except if mergekit is built in a way that makes it incompatible with PEFT. Do you have some experience with using mergekit with PEFT?

ParagEkbote commented 2 weeks ago

I have a couple of points I'd like to add. After completing a model merge process between 2 models, there is no native support within mergekit to perform finetuning with LoRA or by using the TRL Library to help improve performance after merging. By adding these methods, it could make the process effective and simpler.

Secondly, model merging itself is a practice that is being increasingly adopted by companies to help improve model performance, so I think it would be beneficial to add the methods to the PEFT library, if integration of mergekit seems to be difficult.

Let me know if this makes sense.

References:

1) https://arxiv.org/pdf/2405.15032 2) https://cameronrwolfe.substack.com/p/model-merging 3) https://arxiv.org/pdf/2407.06089

BenjaminBossan commented 2 weeks ago

Thanks for the additional context. In that light, I think the more promising way forward is to integrate the methods directly into PEFT without reliance on mergekit. Of course, when it makes sense, we can copy, or at least stick very closely to, the mergekit functions that implement the actual merging logic. Of course, we should attribute mergekit in that case.

If you want to give this a go, that would be fantastic. Feel free to ask any question that comes up and to open draft PR to get early feedback. You can check how DARE and TIES are implemented and see if you could add a new method, maybe you have an idea which ones are most popular.

The current implementations live here:

https://github.com/huggingface/peft/blob/214345ee4787b04636947af50ffb2f869c11613b/src/peft/tuners/lora/model.py#L595-L839

The tests can be found here:

https://github.com/huggingface/peft/blob/214345ee4787b04636947af50ffb2f869c11613b/tests/testing_common.py#L1318-L1500

This code is a bit messy and I wouldn't mind refactoring it, but that can be left as an exercise for the future.

there is no native support within mergekit to perform finetuning with LoRA or by using the TRL Library to help improve performance after merging.

Just for my understanding, the practice would be to merge two trained LoRA adapters, then fine-tune this merged adapter further? Interesting, I didn't know this was commonly done.

ParagEkbote commented 2 weeks ago

Thank you for your immediate reply, could you please assign this issue to me, so that this will be easier to ask for help within the Hugging Face userbase and OSS community. I will start working on it in some weeks, if that works.

If you want to give this a go, that would be fantastic. Feel free to ask any question that comes up and to open draft PR to get early feedback. You can check how DARE and TIES are implemented and see if you could add a new method, maybe you have an idea which ones are most popular.

This process is done to otherwise improve convergence for a model, model merging can be done for two LLMs as well, not just LoRA adapters.

Just for my understanding, the practice would be to merge two trained LoRA adapters, then fine-tune this merged adapter further? Interesting, I didn't know this was commonly done.

Additionally, I had a query. Are we trying to implement something similar to this arcee article or adding support for merging language models like this blog?

References:

1) https://blog.arcee.ai/use-mergekit-to-extract-lora-adapters-from-any-fine-tuned-model/

2) https://mlabonne.github.io/blog/posts/2024-01-08_Merge_LLMs_with_mergekit.html

ParagEkbote commented 2 weeks ago

Could you please clarify?

cc: @BenjaminBossan

BenjaminBossan commented 2 weeks ago

Sorry, what would you like me to clarify?

ParagEkbote commented 2 weeks ago

Should I implement the merging methods for the LoRA adapters only, right?

Not for entire language models?

https://blog.arcee.ai/use-mergekit-to-extract-lora-adapters-from-any-fine-tuned-model/

BenjaminBossan commented 2 weeks ago

My understanding is that new merging methods for LoRA can be implemented, similar to what we have for DARE and TIES. The LoRA extraction feature seems to be a completely separate issue, as it deals with fully fine-tuned models, whereas in PEFT, we can assume that the LoRA adapter already exists separately (whether via extraction or not wouldn't be relevant). LMK if you had other ideas.

ParagEkbote commented 2 weeks ago

I'll do some research on the merging methods and reach out if I need input on specific approaches.

Thanks for the feedback and I'll definitely keep in touch.

huggingface / peft