A couple months ago, inspired by OFT finetuning, I was seeking a way to find an orthogonal matrix $Q$ that simultaneously turns all neurons of a layer of $A$ to reduce the frobenius norm between $A$ and $B$, without scaling or skewing the neurons.
So, a while ago, I found a way to achieve this using the SVD of the product of the neuron matrix of $A$ and $B$ (having one neuron per column), or alternatively known as the orthogonal Procrustes problem.
Note that merging times are slow. It takes ~9 minutes for SD1.5 models and ~45 minutes for SDXL models on my system (I have a RTX 3080). The code linked in sd-mecha contains all the lossless optimizations I was able to find.
Something I found works extremely well for style transfer is rotating towards $A$ the (clipped between $A$ and $B$) add difference with alpha=1.0:
alpha = 1.0
beta = 0.0
recipe = rotate(clip(add_difference(a, b, c, alpha=1.0), a, b), a, alpha=alpha, beta=beta)
Parameters:
beta is the linear interpolation between the aligned weights of $A$ and $B$: $A := (1-\beta) A + \beta Q^T B$
alpha is the fractional rotation factor: $Q := Q^\alpha$ and then we use $QA$ as the final result
I can provide merge samples if requested.
Is there any interest in having this in supermerger?
A couple months ago, inspired by OFT finetuning, I was seeking a way to find an orthogonal matrix $Q$ that simultaneously turns all neurons of a layer of $A$ to reduce the frobenius norm between $A$ and $B$, without scaling or skewing the neurons.
So, a while ago, I found a way to achieve this using the SVD of the product of the neuron matrix of $A$ and $B$ (having one neuron per column), or alternatively known as the orthogonal Procrustes problem.
Link to implementation: https://github.com/ljleb/sd-mecha/blob/caa761c49e2c87b20f3d64a1357b60a5e60664a4/sd_mecha/merge_methods.py#L391 (Please overlook the odd input types, which are in practice just tensors of $A$ and $B$ at the same key. It's for code generation/type checking purposes in sd-mecha)
Additional discussions on the topic:
https://github.com/s1dlx/meh/pull/50#discussion_r1429383612
Note that merging times are slow. It takes ~9 minutes for SD1.5 models and ~45 minutes for SDXL models on my system (I have a RTX 3080). The code linked in sd-mecha contains all the lossless optimizations I was able to find.
Something I found works extremely well for style transfer is rotating towards $A$ the (clipped between $A$ and $B$) add difference with alpha=1.0:
Parameters:
beta
is the linear interpolation between the aligned weights of $A$ and $B$: $A := (1-\beta) A + \beta Q^T B$alpha
is the fractional rotation factor: $Q := Q^\alpha$ and then we use $QA$ as the final resultI can provide merge samples if requested.
Is there any interest in having this in supermerger?