Using OFT to rotate the neurons of A towards B

A couple months ago, inspired by OFT finetuning, I was seeking a way to find an orthogonal matrix $Q$ that simultaneously turns all neurons of a layer of $A$ to reduce the frobenius norm between $A$ and $B$, without scaling or skewing the neurons.

So, a while ago, I found a way to achieve this using the SVD of the product of the neuron matrix of $A$ and $B$ (having one neuron per column), or alternatively known as the orthogonal Procrustes problem.

Link to implementation: https://github.com/ljleb/sd-mecha/blob/caa761c49e2c87b20f3d64a1357b60a5e60664a4/sd_mecha/merge_methods.py#L391 (Please overlook the odd input types, which are in practice just tensors of $A$ and $B$ at the same key. It's for code generation/type checking purposes in sd-mecha)

Additional discussions on the topic:

https://github.com/s1dlx/meh/pull/50#discussion_r1429383612

Note that merging times are slow. It takes ~9 minutes for SD1.5 models and ~45 minutes for SDXL models on my system (I have a RTX 3080). The code linked in sd-mecha contains all the lossless optimizations I was able to find.

Something I found works extremely well for style transfer is rotating towards $A$ the (clipped between $A$ and $B$) add difference with alpha=1.0:

alpha = 1.0
beta = 0.0
recipe = rotate(clip(add_difference(a, b, c, alpha=1.0), a, b), a, alpha=alpha, beta=beta)

Parameters:

beta is the linear interpolation between the aligned weights of $A$ and $B$: $A := (1-\beta) A + \beta Q^T B$
alpha is the fractional rotation factor: $Q := Q^\alpha$ and then we use $QA$ as the final result

I can provide merge samples if requested.

Is there any interest in having this in supermerger?

hako-mikan / sd-webui-supermerger

Using OFT to rotate the neurons of A towards B #347