Negative weights on add_weighted_adapter

freddy5566 commented 5 days ago

System Info

python=3.8 peft=0.11.1

Who can help?

@BenjaminBossan @sayakpaul

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder
[X] My own task or dataset (give details below)

Reproduction

I would like to perform task vector on LoRA adapters in the following fashion: task = pre_trained + LoRA_1 - LoRA_2 here is the following code that I used

#model
base_model = LlamaForCausalLM.from_pretrained(
    pre_trained_model_name_or_path,
    load_in_8bit=False,
    torch_dtype=torch.float16,
    device_map="auto",
)

first_model_name = model_name_or_pathes[0].split("/")[-1]
self.model = PeftModel.from_pretrained(base_model, model_name_or_pathes[0], first_model_name)

names = [first_model_name]
for lora_path in model_name_or_pathes[1:]:
    name = lora_path.split("/")[-1]
    names.append(name)

    self.model.load_adapter(lora_path, adapter_name=name)

adapter_name = "-".join(names)
self.model.add_weighted_adapter(
    adapters=names,
    weights=[1, -1],
    adapter_name=adapter_name,
    combination_type=combine_method,
    density=density,
)
self.model.set_adapter(adapter_name)       
self.model.eval()

but, I got this error message: ValueError: math domain error

I believe it is caused by

https://github.com/huggingface/peft/blob/09358aad308604f7a132cf94709bcc9399a2e1ab/src/peft/tuners/lora/model.py#L791-L794

Expected behavior

After reading the code, I have the following questions,

Why should we apply the scaling on the weight?
Why should we calculate the square root on the scaled weights?
I also noticed that, in this case, the behavior would be a little bit different than using merge_and_unload when there is only one LoRA adapter. It seems like merge_and_unload does not multiply math.sqrt(1* target.scaling[adapter]). It only multiplies the scaling: alpha / rank The merged weight = original weight + BA * scaling https://github.com/huggingface/peft/blob/09358aad308604f7a132cf94709bcc9399a2e1ab/src/peft/tuners/lora/layer.py#L455
What is the correct way to perform task forgetting under this setting?

Thanks!

BenjaminBossan commented 5 days ago

For more context on why the weights are scaled like this, please check this discussion: #1155. This should address questions 1-3.

4. What is the correct way to perform task forgetting under this setting?

We have not experimented with task forgetting and I'm not sure if merging with a negative weight is a possible solution. If you could explain further, what the base model is, what you want it to forget, and what the learned adapters were trained on, I might be able to make some suggestions.

freddy5566 commented 3 days ago

Hello @BenjaminBossan,

Thank you for your quick response. I think I understood the motivation of the current implementation.

However, regarding question 4, sorry, I am afraid that I cannot disclose any details of it. But, the idea behind it is, say, for example, I pre-trained LLaMA w/ LoRA on a multi-task (task a and task b) dataset. I also have a single-task dataset (task a), and now I wanna perform task forgetting via this: task_b = pre_trained + LoRA_1 (task a and b) - LoRA_2 (task a)

I don't know if there are any suggestions for this functionality.

Thanks!

BenjaminBossan commented 2 days ago

Thanks for explaining the general idea. If your two LoRA adapters target the same layers and have the same rank, what you could try is to load the state dict of LoRA_1 and subtract the state dict of LoRA_2 manually, then load the new LoRA weights onto the model. This should probably have the effect that you wanted to achieve with the negative weights. Whether this actually leads to forgetting task a but leaving b intact, I'm not sure. I would actually not think it will work, as this assumes that the two tasks are completely orthogonal.

freddy5566 commented 1 day ago

thanks! it turns out it works to some degree... But, it might not be an ideal way to perform such a task. anyways, thanks for your quick instruction.

BenjaminBossan commented 1 day ago

Great, then I'll wish you luck with your further experiments. If you figure out a way to make forgetting tasks learned by LoRA work well, feel free to share it, maybe we can integrate that in PEFT.

For now, I'll close the issue, please re-open if something new comes up.

huggingface / peft