Closed jukofyork closed 2 months ago
I followed those two papers back to find this: https://arxiv.org/abs/2004.04690
and then this links all the ideas together: https://arxiv.org/abs/2405.17484
They are all to do with image-model LoRAs though:
but equation (5) clearly shows their "Householder Reflection Adaptation" is the same thing with a different parametrisation:
I'll close this now as I think I have my answer and spend a couple of days reading all these now! :)
@danielhanchen
Sorry to raise an issue but I don't know any other way to contact you and thought if anyone knows about this is might be you! :)
I just wondered if you've come across any research related "multiplicative LoRAs" for LLMs?
I have found these two papers that use the same idea for image models (where LoRA seems way more popular):
https://arxiv.org/abs/2306.07280 https://arxiv.org/abs/2311.06243
But they parametrise using a block diagonal matrix instead of a pair of factor matrices, which likely makes sense for images (where block diagonal is related to small-window 1D convolution filters), but not for LLMs...
EDIT: I actually just found these two paper are implemented in the PERF library: boft and oft, but I assume this is really aimed at image models only?
I'm specifically interested in the 2-matrix factorisation of these, and I outlined why I think this would be particularly interesting for the outputs of the
down_proj
matrix here:https://github.com/turboderp/exllamav2/discussions/500#discussioncomment-10532104
But would like to save myself a lot of work if this has been implemented for LLMs already or if there is already a LoRA PERF format that stores these "multiplicative LoRAs".
Have you come across this before?
I think I can easily write the code to extract these in a similar way to Thomas Gauthier's LoRD - just with a couple of extra steps, where A is the base model matrix and B is the delta of the fine-turned matrix:
A + B = A (I + X) = A I + A X = A + A X B = A X A^+ B = A^+ A X = X
(or likely solve using least-squares instead of the pseudo-inverse)
and then take the SVD of X:
X = U^T S V
truncate it:
U'^T S' V'
and either save as a custom "multiplicative LoRA" needing custom code to merge back on to A.
or multiply out:
B' = A U'^T S' * V'
Then take the SVD of B' and save a really large rank version of it in the same way as LoRD already does - meaning it would still work as a standard "additive LoRA" but unfortunately take up way more space than needed due to B' being close to full rank.