"multiplicative LoRAs" for LLMs?

@danielhanchen

Sorry to raise an issue but I don't know any other way to contact you and thought if anyone knows about this is might be you! :)

I just wondered if you've come across any research related "multiplicative LoRAs" for LLMs?

I have found these two papers that use the same idea for image models (where LoRA seems way more popular):

https://arxiv.org/abs/2306.07280 https://arxiv.org/abs/2311.06243

But they parametrise using a block diagonal matrix instead of a pair of factor matrices, which likely makes sense for images (where block diagonal is related to small-window 1D convolution filters), but not for LLMs...

EDIT: I actually just found these two paper are implemented in the PERF library: boft and oft, but I assume this is really aimed at image models only?

I'm specifically interested in the 2-matrix factorisation of these, and I outlined why I think this would be particularly interesting for the outputs of the down_proj matrix here:

https://github.com/turboderp/exllamav2/discussions/500#discussioncomment-10532104

But would like to save myself a lot of work if this has been implemented for LLMs already or if there is already a LoRA PERF format that stores these "multiplicative LoRAs".

Have you come across this before?

I think I can easily write the code to extract these in a similar way to Thomas Gauthier's LoRD - just with a couple of extra steps, where A is the base model matrix and B is the delta of the fine-turned matrix:

A + B = A (I + X) = A I + A X = A + A X B = A X A^+ B = A^+ A X = X

(or likely solve using least-squares instead of the pseudo-inverse)

and then take the SVD of X:

X = U^T S V

truncate it:

U'^T S' V'

and either save as a custom "multiplicative LoRA" needing custom code to merge back on to A.

or multiply out:

B' = A U'^T S' * V'

Then take the SVD of B' and save a really large rank version of it in the same way as LoRD already does - meaning it would still work as a standard "additive LoRA" but unfortunately take up way more space than needed due to B' being close to full rank.

unslothai / unsloth

"multiplicative LoRAs" for LLMs? #991