Closed wutaiqiang closed 4 months ago
Thank you for suggesting this new method. For my better understanding, could you point me to where in the linked repo MoSLoRA is implemented?
Thanks~ I am the author of this paper.
This method is simple, that is, to insert a mixer between lora_A and lora_B.
I use the lora_use_mixer, a bool parameter, to control whether to use the mixer.
https://github.com/wutaiqiang/MoSLoRA/blob/ebc05cddace5ce0d10c610f289c2490a03309492/visual_instruction_tuning/peft/tuners/lora/layer.py#L113 and https://github.com/wutaiqiang/MoSLoRA/blob/ebc05cddace5ce0d10c610f289c2490a03309492/visual_instruction_tuning/peft/tuners/lora/layer.py#L166 show the initialization:
if lora_use_mixer: self.lora_AB[adapter_name] = nn.Linear(r, r, bias=False)
if self.lora_use_mixer[adapter_name]:
nn.init.orthogonal_(self.lora_AB[adapter_name].weight)
nn.init.kaiminguniform(self.lora_AB[adapter_name].weight, a=math.sqrt(5))
There should be two ways~(orthogonal_/kaiminguniform) to initialize the lora_AB (the mixer).
For the forward process, please refer to https://github.com/wutaiqiang/MoSLoRA/blob/ebc05cddace5ce0d10c610f289c2490a03309492/visual_instruction_tuning/peft/tuners/lora/layer.py#L346
Thanks for the pointers. I had looked at your code and searched for "moslora", which didn't have any hits, which is why I didn't spot the changes.
So from my understanding, this is a very straightforward extension of LoRA that adds an extra weight to be multiplied between LoRA A and B. The findings of the paper are that with only a few extra params (as the new weight is only r * r
), the performance can be greatly improved.
I'm fine with going ahead to create a new PR that adds this method. Please ensure to extend the unit tests to encompass this new method. There is probably going to be some extra work required that is not yet in your fork, because we have to ensure that PEFT knows that there is an additional trainable parameter.
Got it, thanks.
Feature request
Paper: Mixture-of-Subspaces in Low-Rank Adaptation
Link: https://arxiv.org/abs/2406.11909
Reference Code: https://github.com/wutaiqiang/MoSLoRA/tree/main/visual_instruction_tuning/peft
Motivation
This paper decomposes LoRA into subspaces via structural re-parameterization and proposes a simple yet effective MoSLoRA method, employing a learnable mixer to fuse more subspaces and more flexibly.
MoSLoRA is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models.
Your contribution
I can help to test code or submit a PR