[Feature]: git-re-basin potential improvement for model merge

vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models

https://github.com/vladmandic/automatic

GNU Affero General Public License v3.0

5.48k stars 400 forks source link

[Feature]: git-re-basin potential improvement for model merge #1176

Closed DirtyHamster closed 8 months ago

DirtyHamster commented 1 year ago

Feature description

I spent a fair portion of the week reading about the various types of block merging methods for models that are currently available. One paper in particular entitled "GIT RE-BASIN: MERGING MODELS MODULO PERMUTATION SYMMETRIES" really caught my attention and I thought you might find it interesting as well. Paper is available here: https://arxiv.org/abs/2209.04836

Is there a reasonable method test to do their method vs what we already use to see if it would be an improvement? What method are you currently using so I can better understand what I'd be testing against? I had thought perhaps testing against their own method proof if their own original method proof is available. I figured I'd write this as I did incase someone else might be interested in testing it too.

My thought was this potentially could be added in as a option under interpolation as an "auto" option. As the weights are auto guided, I thought this might follow along with your idea of Ease-of-Use. Some of the more manual methods have users setting each of the independent in and out weight values model blocks which would also be nice to be able to do without leaving the core UI.

(conversation went on from there)

I did a little extra searching around from that: Just a simple GUI for the code: https://github.com/diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui

GUI used in some of the testing: https://github.com/axsddlr/sd-gui-meh

Code explored: https://github.com/samuela/git-re-basin https://github.com/themrzmaster/git-re-basin-pytorch They have some code in the pull section for dealing with safe tensors partially as well: https://github.com/diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui/pull/1

Code used in testing: https://github.com/s1dlx/meh

Results were brought up from comments below after testing method was agreed on:

Model 1: used is: https://huggingface.co/Deltaadams/HD-22 fp32 Model 2: used is: dreamshaper_5BakedVae.safetensors via: https://huggingface.co/Lykon/DreamShaper Both models pruned from full trainable ema models to fp32 no ema and fp16 no ema prior to testing.

Testing method sampler and size settings: Settings: DPM++ 2M Karras @ 20 steps and a CFG scale of 7, Seed: 1897848000, Size: 512x716, CLIP: 4 Prompts Used: a headshot photographic portrait of a woman, a cat as a DJ at the turntables

Testing regiment: (Multiplier to be run from 0.1 to 0.9)

base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 xyz_grid-0000-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp16 xyz_grid-0032-1897848000-a headshot photographic portrait of a woman

base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 xyz_grid-0001-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp32 xyz_grid-0031-1897848000-a headshot photographic portrait of a woman

base-fp32+custom#1-fp16, base-fp32+custom#2-fp16 xyz_grid-0027-1897848000-a cat as a DJ at the turntables xyz_grid-0026-1897848000-a headshot photographic portrait of a woman

Re-git-basin side will be similarly mirrored: (Weight value set at .5:.5, iteration value to be run from 1 to 10)

Test1: base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 @ weight: .5:.5, iteration {number set...}

xyz_grid-0016-1897848000-a cat as a DJ at the turntables xyz_grid-0018-1897848000-a headshot photographic portrait of a woman

Test2: base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 @ weight: .5:.5, iteration {number set...}

xyz_grid-0019-1897848000-a cat as a DJ at the turntables xyz_grid-0020-1897848000-a headshot photographic portrait of a woman

Test3: base-fp32+custom#1-fp16, base-fp32+custom#2-fp16 @ weight: .5:.5, iteration {number set...}

xyz_grid-0024-1897848000-a cat as a DJ at the turntables xyz_grid-0025-1897848000-a headshot photographic portrait of a woman

Version Platform Description

Latest published version: e04867997e8903b9f44b75d073ef0be8c3159c12 2023-05-25T21:13:56Z

DirtyHamster commented 1 year ago

@vladmandic I decided to run without restarting the pf32 pruned version at iteration 10 it ran without problem RAM peak was 31gb. I started it with the OS and programs using 9gb RAM when it ended running I was at 5gb RAM. The OS seems to have put other things into a lower mem mode correctly. I thought about it and figured this was a better stress test. I'm going to try to run the rest of the 9 iterations off tomorrow as well as play with the -prune -pr arg on the full model with ema in it though I'll probably just run that at 10.

s1dlx commented 1 year ago

@DirtyHamster I'm in touch with axsddlr and I know they are going to work on it. May cook something up in the meantime but I rather have the gui separated from the library repository

ema should be pruned away from the process. For reference, a fp16 pruned model is ~2.2GB
not yet. I'm playing with an automatic checker to stop iterating when weights do not change anymore. I tested at different values (10, 50, 100) and I've got different models every time. They were all OK, so not sure how easy is to pick a number of iterations. In the paper they reach ~250 iterations
It's technically possible but not implemented (PRs are welcome :) )

DirtyHamster commented 1 year ago

@vladmandic Results from the fp32 iteration cycle: xyz_grid-0019-1897848000-a cat as a DJ at the turntables xyz_grid-0020-1897848000-a headshot photographic portrait of a woman

I don't notice as much of a difference in these as I did in the fp16's, I'm kind of surprised. It could just be an effect of the extra accuracy of the model though. Next up the prune test and the fp16xfp32 tests.

@s1dlx I think keeping them separate is fine. No need to trouble yourself, especially if axsddlr already working on it.

2. not yet. I'm playing with an automatic checker to stop iterating when weights do not change anymore. I tested at different values (10, 50, 100) and I've got different models every time. They were all OK, so not sure how easy is to pick a number of iterations. In the paper they reach ~250 iterations

When I was looking at just the 1-10 I kind of thought it would be hard to pick too. Was your value runs going to 50 and 100 much different than the runs ending at 10? I like the idea of an automatic checker too.

3. It's technically possible but not implemented (PRs are welcome :) )

It was mostly a question of curiosity as I realized how many iterations they had done. I could imagine myself at 245 bluescreening. lol

DirtyHamster commented 1 year ago

@vladmandic results from the mixed fp32 x fp16 models outputted as fp16: xyz_grid-0024-1897848000-a cat as a DJ at the turntables xyz_grid-0025-1897848000-a headshot photographic portrait of a woman

I think this might be one of the best runs yet when looking at the comparison from what we currently use in increments from 0.1 to .09 at least on the cat prompt. I did pull an update for meh so I'll be double checking the previously generated merges just to see if that had anything to do with it.

The prune test went fine it I just didn't find that necessary to do outputs for. It can pass models with their ema that way not that people should be retaining their models ema's after merging. That does get rid of the potential for error issues though.

I still have to do time tests and get a closer watch on hardware usage but I'll do those at iteration 10 at fp16 and fp32 while I do that double checking that I mentioned above.

vladmandic commented 1 year ago

hmm, why are all the results looking the same? there was one great set of results 5 days ago, since then i'm at a bit of a loss?

DirtyHamster commented 1 year ago

@vladmandic That's probably because the last few posted are the git-rebasins iteration outputs focused on comparing the traditional 0.5:0.5 merge vs RB method using .5 at iterations, so all the changes are a lot smaller. You have to take the set that you just mentioned look at the model A full, 0.5, and model B full and then compare based on that to the iterations across the re-basin merge which is done at a default .5 at rbi {1,2...10}.

I mentioned that it had to be 0.5's here: https://github.com/vladmandic/automatic/issues/1176#issuecomment-1586410540 though I didn't go into great detail. Initially I had tried running 0.9:0.1 weighted and ran off some iterations then went to 0.8:0.2 at iteration. I also tried using smaller increments 0.95:0.05... Every single one that was not 0.5:0.5 of those attempts ended up with model a being shown first and second image then model b being the rest of the images. Then I re-read the paper realizing it's just trying to get the best 50/50 mix. Which is still really useful though not as variable as I initially thought it would be but still extremely useable in a merge regiment.

I've been gathering the individual results up at the bottom of the first message so they are not as randomly positioned across the msg thread. I have to fill a few spots that were individually generated with grids though still. I'll bring a copy of the gathered results down here after I have all the results together. So it's easier to refer back to in discussion.

vladmandic commented 1 year ago

ahh, serves me right when i read updates on my phone just before going to bed. thanks for setting me straight. and yes, i do remember previous conversations :)

DirtyHamster commented 1 year ago

No worries... I do that all the time between having multiple projects going and insomnia I can be all over the place somedays too. That's also why I want to double check everything. The last run had me wondering why the cat arms are less human like?

DirtyHamster commented 1 year ago

This was the mixed fp32xfp16 outputed as fp16 set (normal merge):

base-fp32+custom#1-fp16, base-fp32+custom#2-fp16 xyz_grid-0027-1897848000-a cat as a DJ at the turntables xyz_grid-0026-1897848000-a headshot photographic portrait of a woman

DirtyHamster commented 1 year ago

@vladmandic On my todo list: Just so we're on sync on what I think I can get done in the coming week.

Over Weekend + Monday: re-output grids for: a headshot photographic portrait of a woman base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 Double check test 1,2 on re-basin side. Bring results down for discussion:

Coming week: Looking for a good hardware logger with selectable fields (mine doesn't have selectable fields for logging): Suggestions are appreciated? Stopwatch time trials at iteration 10 Closer look at Vram and Ram usage at iteration 10

Extras & some speculation:

I want to try the following with stopwatch times (These can only get done if I have time to babysit the computer though.) Attempt iteration runs at 15,20,25,30 or 20,30,40,50: Suggestions are appreciated? Though my intent is not to go to 250.

This extra is pretty much just to get a better handle on iterations and the permutation outcomes at values beyond 10. However I do suspect that models might be closer together then expected based on cost of training and quantity training start points against the current hardware requirements for training. So shorter iteration values might be more plausible now than when training on a home PC becomes more readily available. So iterations might go up over time as availability home training become more available.

Note 1: meh had an update to 0.7.0 Partial reason for some of the double checks. They added some other interesting options too as auto presets for block weights have a look: https://github.com/s1dlx/meh/wiki/Presets @s1dlx Can you give us any more insight into what and how was implemented in the addition for this?

Note 2: Model architecture: Would it be possible to do as an option Add in as a simple action modular architecture: inpaint, pix to pix... from looking at model toolkit https://github.com/arenasys/stable-diffusion-webui-model-toolkit I think these could be stored as component defaults off to the side and just be added in pre-final-save. Focusing on wider range usability here when looking at what is commonly asked for on model sharing sites. This is much an after thought though and out of scope is a fair answer.

Thoughts and input are very welcome. I always expect delays and am not in a hurry btw.

vladmandic commented 1 year ago

Looking for a good hardware logger

What's the use-case you need it for?

I want to try the following with stopwatch times (These can only get done if I have time to babysit the computer though.)

Why do you need to babysit the computer through it?

Attempt iteration runs at 15,20,25,30 or 20,30,40,50: Suggestions are appreciated? Though my intent is not to go to 250.

Don't know if this uses linear interpolation or is it just linear interations as-is? If its using interpolation, 99 is pretty much highest anyhow.

auto presets for block weights

Those look really interesting from the theory perspective.

Would it be possible to do as an option Add in as a simple action modular architecture

Not sure I follow.

Thoughts and input are very welcome. I always expect delays and am not in a hurry btw.

My $0.02 - we should probably define what are exit criterias?

DirtyHamster commented 1 year ago

What's the use-case you need it for?

I figured I could log the runs at iteration so you could see usage acrose, cpu, ram, vram...

Why do you need to babysit the computer through it?

I have no current way unless taught a way to monitor the starting and stopping dependent on the runtime.

Don't know if this uses linear interpolation or is it just linear interations as-is? If its using interpolation, 99 is pretty much highest anyhow.

It assesses the weight deviance between the two models at the start and end each epoch iteration as far as I understand, I wouldn't consider the method as linear as a function could call a change but a permutation of the last run is based on it's last output figures. The paper goes to iteration measuring the weights and estimation methods. As there is no back checking across previous iterations it should get better or degrade across iterations. A user should be able continue the iterations unless manually stopping or if that estimation reaches a point of no change? Very brief chat about it here for an auto stop function up a few posts: https://github.com/vladmandic/automatic/issues/1176#issuecomment-1588762518

Not sure I follow.

A base model needs 3 parts to work via (V1):

SD-v1: --VAE-v1 ---VAE-v1-SD (baking would replace this) --CLIP-v1 ---CLIP-v1-SD --UNET-v1 ---UNET-v1-SD

Things we could also offer to add in: UNET-v1-Inpainting, UNET-v1-Pix2Pix, SD-v1-ControlNet, UNET-v2-Inpainting, UNET-v2-Depth. These are just components that can be added to a base model. I think these could be added in potentially as easily as we do a clip fix though.

My $0.02 - we should probably define what are exit criteria's?

We never defined that, but to me it's ok either way I'm just not sure where, I should just suspend for a decision? Should this be just limited to a better 50:50 merge or do we want it to be more usable across the user base as a whole? I'm just not sure if we want to cover things like model architecture inputs and stuff like that as mentioned above. Presets to weight I think is fair.. I'm not sure what is too far, if you know what I mean at the same time I know what is useful.

Do we have any exit for end of scope suggestions on this first improvement attempt?

For me it would be to take most of it to a logical end unless extra work would hold back the improvement.

s1dlx commented 1 year ago

@DirtyHamster presets are simply lists of 26 floats you give to the merger for doing block merge. They are implemented in presets.py

regarding logging, meh already logs vram for rebasin

in the dev branch there’s some initial proper logging added. That would make it in release 0.8

also, a lot of the experiments you are doing are being also done in our small discord server. Perhaps you can join and compare results

DirtyHamster commented 1 year ago

experiments you are doing are being also done in our small discord server

Sure what's the discord address?

vladmandic commented 1 year ago

just an idea of exit critera:

does it work? yes.
how does it compare to existing basic merge?
- benefits? (from user perspective, this mostly comes down to visual quality?)
- functionality? (is it only 0.5+0.5 or can it do other?)
- performance?
- requirements? (vram)
what are the best defaults?
which tunables should be exposed? (that create value for users)

s1dlx commented 1 year ago

experiments you are doing are being also done in our small discord server

Sure what's the discord address?

I’ll add it to the readme but it’s the same we have for Bayesian-merger

https://github.com/s1dlx/sd-webui-bayesian-merger

DirtyHamster commented 1 year ago

just an idea of exit critera:

does it work? yes.

how does it compare to existing basic merge?

benefits? (from user perspective, this mostly comes down to visual quality?)

functionality? (is it only 0.5+0.5 or can it do other?)

performance?

requirements? (vram)

what are the best defaults?

which tunables should be exposed? (that create value for users)

This is fair.. I think a lot of these are a yes but lets look at it after the double checks an result fill ins. I'm serious enough to say I would use this for most of my own 50 percent merges regardless if you add it in. Most of the presets can probably be left out though it would be nice to have options to include them.

Sure what's the discord address? https://discord.gg/X8U6ycVk

I'll take a look. I don't normally use discord.

@DirtyHamster presets are simply lists of 26 floats you give to the merger for doing block merge. They are implemented in presets.py regarding logging, meh already logs vram for rebasin in the dev branch there’s some initial proper logging added. That would make it in release 0.8 also, a lot of the experiments you are doing are being also done in our small discord server. Perhaps you can join and compare results

I'll take a closer look, but I want more specs than just vram. i,e ram cpu temps...

vladmandic commented 1 year ago

Having a pull down menu with presets is not a problem, no matter how many there are. Having 10+ different checkboxes or number ranges is not.

Regarding logger, if you want low level, GPU-Z has a built-in logger and nothing beats it's sensors. Even if you don't like built-in logger, I'd suggest to search for something that uses their sensor data.

DirtyHamster commented 1 year ago

@vladmandic Will check it out I'm doing this between a lot of yard work so I might crap out a few days coming I have around 2 tons of small stone to move that just got dropped off yesterday and a deck to fix.

All the fields available via the, current help listing is this for meh:

Usage: merge_models.py [OPTIONS]

To give you an idea of my inputs for testing:

Basic re-basin only seems to require the following: merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\HD-22-fixclip-noema-fp32.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\dreamshaper5_Bakedvae_fp16-noema.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test5\0-merge000001_10.safetensors -f safetensors -ba 0.5 -bb 0.5 -rb -rbi 10

What s1dlx would give us a lot more options regardless if re-basin is better or not though I think it is better for 50:50's.

On a side note is it possible to use the default file explorer to select the model rather than a drop down in the ui when select ing the main model's, vaes, or other components? I have so many so being able to sort them is an issue, I don't know if anyone else is this crazy with collecting them.

I'll take a look at GPU-Z later tonight too. Have to get back to work for a bit.

vladmandic commented 1 year ago

On a side note is it possible to use the default file explorer to select the model rather than a drop down in the ui when select ing the main model's, vaes, or other components?

whatever you select via explorer needs to be validated and matched to some known entry so that exact entry can be used moving forward. for example, server knows about models it enumerated - trying to select something it doesn't already know of is a mess. definitely doable, but non trivial.

regarding params, i just noticed one thing - i need to take a look at the code if git-re-basin allows passing of actual device instead of string (cpu or cuda) as sdnext itself supports quite a few other backends and if i integrate this, i cannot say, oh this is cuda-only feature.

DirtyHamster commented 1 year ago

I haven't tried the swap arg on device since version 4 at that point I couldn't get the option arg to work. I think by default it's set to cpu.

I found that the drop-downs are a little messy to find stuff in during testing. It's 10x of any model involved. So it just has me thinking can we do sort orders and stuff. I think I'm beyond normal use case scenarios but it's still something to think about UI wise.

s1dlx commented 1 year ago

meh cli tool has that many arguments but the library is much simpler. Basically all the presets stuff is not included as those simply override “wa” and “ba” (and the beta ones).

on the cuda/cpu side…I imagine you can change cuda with “gpu” and get the same result on amd cards

an example of how to use the library is given by the sd-webui-bayesian-merger extension

the idea of the library is to be a generic merging one, not just a tool for rebasin

DirtyHamster commented 1 year ago

@s1dlx

meh cli tool has that many arguments but the library is much simpler. Basically all the presets stuff is not included as those simply override “wa” and “ba” (and the beta ones).

I meant that as in what fields might be needed in the UI wise not that it was over complex or anything. I think what you have going on in your repository is much better then the basic one that we are using currently.

@vladmandic I'll probably get back to testing tomorrow. Other than the existing last few add-ins for compare the only other potion I can test between the current method and meh is add difference. Should I run tests on that too?

vladmandic commented 1 year ago

Maybe just a quick test to see if anything deeper is really needed?

DirtyHamster commented 1 year ago

@vladmandic

Maybe just a quick test to see if anything deeper is really needed?

I'll do a light testing on it while finish up the other stuff. I want to do the time trials too. I haven't found any issues yet that would concern anyone so it's probably safe to start looking at the code to incorporate it. You kind of make out like a bandit with all the extras that are packed into the repository.

I'll run this off too:

btw, totally off-topic, since you've already mentioned lyriel model before (and that's one of my favorites, i'd be curious (and this is just personal) how does it merge with another something like https://civitai.com/models/18798/meinaunreal

I'll do it as 0.1 to 0.9 same as in the tests. Haven't forgotten about it.

DirtyHamster commented 1 year ago

Finished outputting the fp32 merges that I have to re-output, about to start working on the fp16's.. I really shouldn't have deleted them lol... After thoughts are great right...

Just figured do a small status update and noticed the time clock from the last merge as it's still running from last night. Thought you might find it amusing as it's the only thing I've managed to break...

In about an hour or 2 and I'll have the rest of the results out. The extended testing is still to come though.

DirtyHamster commented 1 year ago

@vladmandic basic testing is done will get on with the extras:

Results so far for discussion if necessary: Model 1: used is: https://huggingface.co/Deltaadams/HD-22 fp32 Model 2: used is: dreamshaper_5BakedVae.safetensors via: https://huggingface.co/Lykon/DreamShaper Both models pruned from full trainable ema models to fp32 no ema and fp16 no ema prior to testing.