huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
15.98k stars 1.56k forks source link

Add new features: Safe LoRA #2098

Open chiayi-hsu opened 4 days ago

chiayi-hsu commented 4 days ago

Hello,

We have published a paper called Safe LoRA (https://arxiv.org/abs/2405.16833). This work focuses on improving the safety of well-trained LoRA models. Additionally, I have provided an example implementation in the model.py file to illustrate how to apply the Safe LoRA approach effectively.

BenjaminBossan commented 3 days ago

Thanks for proposing this PR to add the Safe LoRA method to PEFT. I have only skimmed the code and paper, so this is not an in-depth review yet.

Based on what I saw, this is my high level understanding. As a user, I start out by training a LoRA adapter on my dataset using an aligned model as the base model. Next, I want to restore safety which may be reduced after training. For this, I take my trained LoRA model, the base model, and the aligned model, Then Safe LoRA will create a new adapter that "injects" the alignment back into my LoRA adapter. Please correct me if my understanding is incorrect.

Implementation-wise, I wonder if we really need a SafeLoRA class. Here is a proposal for a different user API, LMK what you think:

peft_model = ...  # my trained LoRA model

# apply_safe_lora is a new function that does what SafeLoRA currently does
# additional arguments like select_layers_type are option args for apply_safe_lora
aligned_model = apply_safe_lora(peft_model, base_model_id, aligned_model_id)

# persist the new safe LoRA adapter
aligned_model.save_pretrained(...)

IMO, this would be a simpler API and it would achieve the same thing at the end.

In this example, the aligned_model could actually be the same peft_model as initially, just with a new LoRA adapter loaded which is the Safe LoRA adapter.

Another concern I saw when I read the code is that it looks like it will require a lot of memory. IIUC, we need to have the PEFT model, a copy of the PEFT model, the base model, and the aligned model in memory all at once. Even on CPU RAM, this could be difficult to achieve for many users. I wonder if it would be possible to load the weights from the base model and the aligned model one at a time, at least as an option.

I have already implemented something like this for another use case. Here is the code:

https://github.com/huggingface/peft/blob/ccc350151f95a9ff95da046bae5671da75eab52f/src/peft/utils/loftq_utils.py#L263-L411

It only works for safetensors, because pickle does not allow to load the weights lazily, but I still think it could be a nice addition.

Also, do we really need a full copy of the PEFT model? Would it be sufficient to only copy the LoRA weights? Ideally, I would like to see a solution that works even if the user has less than twice the memory required for the base model. If this doesn't work, it's also okay, but it would greatly reduce the number of users who can use the method.

chiayi-hsu commented 4 hours ago

Thank you for taking the time to read the code and the paper.

Yes, your understanding is correct.

Since SafeLoRA is not a new training method for LoRA, but rather a process that enhances the safety of a well-trained LoRA model through certain operations, it may not be necessary to have a separate SafeLoRA class.

The most CPU memory-intensive operation here is likely when executing get_aligned_matrix(), as it requires loading both the base model and the aligned model. If it were possible to implement lazy loading for both models simultaneously, allowing for the subtraction of weights in the same position to obtain the aligned matrix, it could potentially be a solution.

Regarding your last question, my current code in projected_weighted() does load the complete PEFT model (aligned model + LoRA). However, it is indeed possible to operate only on the LoRA weights without needing to load the weights of the aligned model as well. Once the user has obtained the new LoRA weights, they can then add them to the original aligned model.

BenjaminBossan commented 1 hour ago

Thanks for confirming and further explaining. Would you be willing to make the suggested changes? I think it would help a lot with user adoption of the new method.