huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
16.32k stars 1.6k forks source link

Add MoSLoRA #1878

Closed wutaiqiang closed 4 months ago

wutaiqiang commented 4 months ago

Feature request

Paper: Mixture-of-Subspaces in Low-Rank Adaptation

Link: https://arxiv.org/abs/2406.11909

Reference Code: https://github.com/wutaiqiang/MoSLoRA/tree/main/visual_instruction_tuning/peft

Motivation

This paper decomposes LoRA into subspaces via structural re-parameterization and proposes a simple yet effective MoSLoRA method, employing a learnable mixer to fuse more subspaces and more flexibly.

MoSLoRA is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models.

Your contribution

I can help to test code or submit a PR

BenjaminBossan commented 4 months ago

Thank you for suggesting this new method. For my better understanding, could you point me to where in the linked repo MoSLoRA is implemented?

wutaiqiang commented 4 months ago

Thanks~ I am the author of this paper.

This method is simple, that is, to insert a mixer between lora_A and lora_B.

image

I use the lora_use_mixer, a bool parameter, to control whether to use the mixer.

https://github.com/wutaiqiang/MoSLoRA/blob/ebc05cddace5ce0d10c610f289c2490a03309492/visual_instruction_tuning/peft/tuners/lora/layer.py#L113 and https://github.com/wutaiqiang/MoSLoRA/blob/ebc05cddace5ce0d10c610f289c2490a03309492/visual_instruction_tuning/peft/tuners/lora/layer.py#L166 show the initialization:

if lora_use_mixer: self.lora_AB[adapter_name] = nn.Linear(r, r, bias=False)

if self.lora_use_mixer[adapter_name]:

nn.init.orthogonal_(self.lora_AB[adapter_name].weight)

nn.init.kaiminguniform(self.lora_AB[adapter_name].weight, a=math.sqrt(5))

There should be two ways~(orthogonal_/kaiminguniform) to initialize the lora_AB (the mixer).

For the forward process, please refer to https://github.com/wutaiqiang/MoSLoRA/blob/ebc05cddace5ce0d10c610f289c2490a03309492/visual_instruction_tuning/peft/tuners/lora/layer.py#L346

BenjaminBossan commented 4 months ago

Thanks for the pointers. I had looked at your code and searched for "moslora", which didn't have any hits, which is why I didn't spot the changes.

So from my understanding, this is a very straightforward extension of LoRA that adds an extra weight to be multiplied between LoRA A and B. The findings of the paper are that with only a few extra params (as the new weight is only r * r), the performance can be greatly improved.

I'm fine with going ahead to create a new PR that adds this method. Please ensure to extend the unit tests to encompass this new method. There is probably going to be some extra work required that is not yet in your fork, because we have to ensure that PEFT knows that there is an additional trainable parameter.

wutaiqiang commented 4 months ago

Got it, thanks.