hako-mikan / sd-webui-supermerger

model merge extention for stable diffusion web ui
GNU Affero General Public License v3.0
728 stars 106 forks source link

Feature requests: Add "AND gate" merge method #72

Open JilekJosef opened 1 year ago

JilekJosef commented 1 year ago

Basically it's exact oposite to add difference method. Instead of checking if values (in models B and C) are different enough you check if they are similar enough and than you place these values to model A. Purpose?

  1. Extracting concepts that 2 models have in common to another
  2. Fixing model errors/merge without transfering error values (well, this is probably more of LoRA thing since you don't need the third model here and just preserve similar vectors and trash the rest)

I was able to succesfully test the second point with 2 lora epochs. I used torch.norm(A-B) to determine how different they are which was a bit tricky to configure correctly (I didn't standardise the vectors first which was probably a part of the issue), there is probably better method than basic norm that I don't know about since it was first time I was doing somthing like this (I am used to web development in Java mostly). But like in the end I believe it proved the concept. To the first point I don't know how much it will work, but I believe it should't be hard for you to test it since it probably does not require anything more than slight modification of add difference method.

hako-mikan commented 1 year ago

Very interesting suggestion. I will think about the implementation.

zethfoxster commented 1 year ago

wouldnt this basically spit out 1.5sd with only the most common concepts the 2 models have? what exactly would be a practical use of this?

JilekJosef commented 1 year ago

wouldnt this basically spit out 1.5sd with only the most common concepts the 2 models have? what exactly would be a practical use of this?

Basically, concept extraction in case of models A + (B AND C) replace weights in A with similar weights from B and C. In case of LoRA when you have multiple epochs you can do just like C = A AND B which wou purify these LoRAs from redundant things that are different in A and B

le-khang commented 1 year ago

This is the exact idea I have in mind for LORA merging. I noticed that when training 3 LORAs (for the same person) using 3 different models and then merging them together, the new LORA becomes very stable and flexible. I'm not saying it's the best, but it's something like:

I think that if we can extract the exact concept without it being polluted by other elements, then we can freely increase its strength to improve the quality & flexibility while also reduce the file size.

Deathawaits4 commented 9 months ago

is there any news on this one? i think this could massively increase lora usability and put it up on to dreambooth again

JilekJosef commented 9 months ago

I have created this https://github.com/JilekJosef/loli-diffusion-merger It's sort of fork of supermerger. However I have implemented AND gate for models only, and the calculation method used works at single value vs single value basis, at least tensor level should be implemented to make it more useable I believe. @Deathawaits4

ljleb commented 6 months ago

This suggestion is similar to a weighted geometric average:

def multiply_difference(a, b, c, alpha):
    a = torch.complex(a - c, torch.zeros_like(a))
    b = torch.complex(b - c, torch.zeros_like(b))
    res = a**(1 - alpha) * b**alpha
    return c + res.real

if any parameter is 0 in A or B, then the corresponding parameter will be 0 in the output. If both parameters are close, then the output doesn't change much.