feat: add weights clipping

ljleb commented 1 year ago

Add the -wc flag to clip the weights using max(|A|, |B|) as the threshold. Examples:

merge_models.py : do not clip
merge_models.py -wc : hard clip

ljleb commented 1 year ago

The PR job tried to push to pypi, the pipeline may need a bit of tweaking.

ljleb commented 1 year ago

I am not certain this is the best way to go, an alternative would be to add an entire merge method and call the merge function repeatedly with different merge methods to compose them together.

ljleb commented 1 year ago

In fact, soft clipping does not make sense for weighted average, tensor sum, weighted subtraction, sum twice nor triple sum. Soft clipping only works with weights difference, I'll remove it and keep the flag for hard clipping unless we decide to implement weights clipping as a merge method instead.

s1dlx commented 1 year ago

I think it's fine to have it this way

perhaps we raise a warning when clipping with a method it doesn't make sense to

ljleb commented 1 year ago

For diff clipping (whether hard or soft), I just realized it is in fact very different form weights hard clipping. I think I need to run more tests. It may or may not be as beneficial as hard clipping weights directly.

ljleb commented 1 year ago

As far as my arbitrary testing goes, when using add difference alpha=1.0, clipping weight differences achieves lower distortion than unclipped models. Weight differences clipping makes it possible to soft clip without destroying the model. Weight difference soft clipping has a similar effect to reducing alpha in add difference, however it has a non-linear effect which is stronger on weights in A and B that differ a lot from C.

In this example:

alpha = 1.0
A = aZovyaRPG v2
B = anything v5
C = weighted sum(alpha = 0.5, A = sd1.4, B = sd1.5)

Clipping profiles:

picture 1 : unclipped
picture 2 : absolute hard clipping -- hard_clip(A + B - C) with respect to max(|A|, |B|)
picture 3 : relative hard clipping -- C + hard_clip(A + B - 2C) with respect to max(|A - C|, |B - C|)
picture 4 : relative soft clipping -- C + soft_clip(A + B - 2C) with respect to max(|A - C|, |B - C|) and soft=0.6 (see this desmos graph to get a feel for the parameters)

generation parameters

``` magician male throwing a water spell Negative prompt: close up, muscles, draft, worst quality, lowres, naked, loli, child, teen, young, art by xynon-bad-11k-2 Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3071294903, Size: 512x768, Model hash: 47e7659eb9, Model: zovya_anything_dif100, ENSD: 31337, Script: X/Y/Z plot, X Type: Checkpoint name, X Values: "zovya_anything_dif100.safetensors [47e7659eb9],zovya_anything_dif100_clip.safetensors,zovya_anything_dif100_clip2.safetensors,zovya_anything_dif100_clip3.safetensors" Used embeddings: xynon-bad-11k-2 [8df6] ```

It seems to be a lot harder to steer the absolute clipped model than A or B. I speculate this is because the weights of the resulting model are more saturated than in A or B.

Additionally, I have not yet been able to evaluate properly the proximity of clipped models with the original A and B models. Finding 2 very different models and merging them together with weights clipping would help appreciate this aspect of the merge.

As this is very experimental, I am not sure it make sense to add relative clipping functions. I'll add these here as soon as I can, but do let me know if you prefer to introduce them in a follow up PR.

s1dlx / meh

feat: add weights clipping #5