huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
16.46k stars 1.62k forks source link

support for heterogeneous types for `modules_to_save` #2136

Closed saeid93 closed 2 days ago

saeid93 commented 1 month ago

Feature request

From my understanding of the current implementation, the modules_to_save wrappers are currently limited to copying only one specific layer of the model (reference: https://github.com/huggingface/peft/blob/859fd880e66fd152401b48c1f8c8bacbcc576f70/src/peft/utils/other.py#L192). Adding this feature would allow for different sets of modules to be saved for each LoRA. For instance, this could support multiple LoRA classifiers, each with classifier layers of varying sizes applied to the last layer.

Motivation

This feature is particularly useful for the final classifier layer. Currently, I have a model with multiple LoRAs attached for a classification task, but the classifier layers are not all the same size. As a result, I need to maintain several models, grouping LoRAs with the same classifier size into the same base model. However, since the core model remains identical, it should be possible to use a single base model for all of them, especially since we are training the classifier layers from scratch. A potential solution could be to introduce an additional option, allowing users to specify the modules_to_save class for the classifier layer, instead of simply copying the existing layer.

Your contribution

I'm happy to explore possible solutions and potentially contribute a PR if this is considered a valuable addition to the library.

BenjaminBossan commented 1 month ago

Hmm, I see the problem but I'm not sure if modules_to_save is the right place to add a solution. The issue is that modules_to_save will create a copy of the targeted layer. Therefore, this copy will start from the same parameters as the original layer. If we allow to change the shape of the parameters, they can no longer be a copy of the base parameters but would need to be randomly initialized.

I understand that for some tasks, we just randomly initialize the classifier head, so it doesn't really matter if modules_to_save creates a copy with the exact same parameters or if the parameters are reinitialized randomly. But for other use cases, modules_to_save can target already trained layers, so there it does matter.

For this reason, I feel like modules_to_save is not the right place to put such functionality. If I had this problem, I would probably first create the base model and add multiple classifier heads that can be switched via some parameter (basically what you would have to do if you did full fine-tuning). Then you could create one PEFT adapter per task/classifier head that each has its own modules_to_save which targets one of these classifier heads.

It is a bit wasteful since we create a copy for each of these heads, so if memory is very tight, this would not be an optimal solution. Given that you probably don't need the weights of the randomly initialized heads, you could probably delete those to recuperate the memory. It would just mean that you can't use the model without the adapter (e.g. disabling adapters would not work) but there would not really be any reason to do that.

saeid93 commented 1 month ago

@BenjaminBossan Thank you for your comment and for proposing a solution. Having multiple classifier heads in the base model based on the LoRA requirements would indeed solve my issue. As you suggested, I could remove the base heads and keep only those needed for the LoRAs. Thank you!

Do you think this is something worth exploring as a feature for the main library? For example, instead of modifying the modules_to_save as I initially suggested, we could introduce a new concept or class, such as dynamic_adapter_modules, which could be specified for each LoRA and might not necessarily exist in the base model.

BenjaminBossan commented 1 month ago

Good question. I'm a little bit torn about the suggestion, since at first glance, it has nothing really to do with parameter-efficient fine-tuning and thus would not fit the spirit of the PEFT library.

However, having a convenient way to switch out specific modules can be handy in a few situations when working with these models. PEFT already implements something like that with the option to load multiple adapters and switch between them, so there is a clear connection in the implementation itself, even if not in spirit. Therefore, if you can come up with a very nice API to achieve this, we I would be open to the possibility. So I would start from the API and only when we can agree on something nice, start working on the implementation.

As to the implementation, I would probably still use ModulesToSaveWrapper under the hood. The reason is that as is, it is treated as a special case throughout several parts of the code. If we wanted to add another special case like that, it would be a lot of work to ensure it is integrated correctly everywhere. So probably this new feature would be built on top of ModulesToSaveWrapper but it could still have a separate entry in the config.

saeid93 commented 1 month ago

Yes, that makes sense, especially in scenarios where PEFT is used as a repository for different adapters with a similar base. However, unless there's a clean way to implement this, it might be out of scope for the current framework. I'll give it some thought and see if I can find a straightforward solution when I get the chance.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.