hako-mikan / sd-webui-supermerger

model merge extention for stable diffusion web ui
GNU Affero General Public License v3.0
753 stars 112 forks source link

Using OFT to rotate the neurons of A towards B #347

Open ljleb opened 9 months ago

ljleb commented 9 months ago

A couple months ago, inspired by OFT finetuning, I was seeking a way to find an orthogonal matrix $Q$ that simultaneously turns all neurons of a layer of $A$ to reduce the frobenius norm between $A$ and $B$, without scaling or skewing the neurons.

So, a while ago, I found a way to achieve this using the SVD of the product of the neuron matrix of $A$ and $B$ (having one neuron per column), or alternatively known as the orthogonal Procrustes problem.

Link to implementation: https://github.com/ljleb/sd-mecha/blob/caa761c49e2c87b20f3d64a1357b60a5e60664a4/sd_mecha/merge_methods.py#L391 (Please overlook the odd input types, which are in practice just tensors of $A$ and $B$ at the same key. It's for code generation/type checking purposes in sd-mecha)

Additional discussions on the topic:

https://github.com/s1dlx/meh/pull/50#discussion_r1429383612

Note that merging times are slow. It takes ~9 minutes for SD1.5 models and ~45 minutes for SDXL models on my system (I have a RTX 3080). The code linked in sd-mecha contains all the lossless optimizations I was able to find.

Something I found works extremely well for style transfer is rotating towards $A$ the (clipped between $A$ and $B$) add difference with alpha=1.0:

alpha = 1.0
beta = 0.0
recipe = rotate(clip(add_difference(a, b, c, alpha=1.0), a, b), a, alpha=alpha, beta=beta)

Parameters:

I can provide merge samples if requested.

Is there any interest in having this in supermerger?