vladmandic / automatic

SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.36k stars 382 forks source link

[Feature]: git-re-basin potential improvement for model merge #1176

Closed DirtyHamster closed 6 months ago

DirtyHamster commented 1 year ago

Feature description

I spent a fair portion of the week reading about the various types of block merging methods for models that are currently available. One paper in particular entitled "GIT RE-BASIN: MERGING MODELS MODULO PERMUTATION SYMMETRIES" really caught my attention and I thought you might find it interesting as well. Paper is available here: https://arxiv.org/abs/2209.04836

Is there a reasonable method test to do their method vs what we already use to see if it would be an improvement? What method are you currently using so I can better understand what I'd be testing against? I had thought perhaps testing against their own method proof if their own original method proof is available. I figured I'd write this as I did incase someone else might be interested in testing it too.

My thought was this potentially could be added in as a option under interpolation as an "auto" option. As the weights are auto guided, I thought this might follow along with your idea of Ease-of-Use. Some of the more manual methods have users setting each of the independent in and out weight values model blocks which would also be nice to be able to do without leaving the core UI.

(conversation went on from there)

I did a little extra searching around from that: Just a simple GUI for the code: https://github.com/diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui

GUI used in some of the testing: https://github.com/axsddlr/sd-gui-meh

Code explored: https://github.com/samuela/git-re-basin https://github.com/themrzmaster/git-re-basin-pytorch They have some code in the pull section for dealing with safe tensors partially as well: https://github.com/diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui/pull/1

Code used in testing: https://github.com/s1dlx/meh

Results were brought up from comments below after testing method was agreed on:


Model 1: used is: https://huggingface.co/Deltaadams/HD-22 fp32 Model 2: used is: dreamshaper_5BakedVae.safetensors via: https://huggingface.co/Lykon/DreamShaper Both models pruned from full trainable ema models to fp32 no ema and fp16 no ema prior to testing.

Testing method sampler and size settings: Settings: DPM++ 2M Karras @ 20 steps and a CFG scale of 7, Seed: 1897848000, Size: 512x716, CLIP: 4 Prompts Used: a headshot photographic portrait of a woman, a cat as a DJ at the turntables

Testing regiment: (Multiplier to be run from 0.1 to 0.9)

base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 xyz_grid-0000-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp16 xyz_grid-0032-1897848000-a headshot photographic portrait of a woman

base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 xyz_grid-0001-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp32 xyz_grid-0031-1897848000-a headshot photographic portrait of a woman

base-fp32+custom#1-fp16, base-fp32+custom#2-fp16 xyz_grid-0027-1897848000-a cat as a DJ at the turntables xyz_grid-0026-1897848000-a headshot photographic portrait of a woman

Re-git-basin side will be similarly mirrored: (Weight value set at .5:.5, iteration value to be run from 1 to 10)

Test1: base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 @ weight: .5:.5, iteration {number set...}

xyz_grid-0016-1897848000-a cat as a DJ at the turntables xyz_grid-0018-1897848000-a headshot photographic portrait of a woman

Test2: base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 @ weight: .5:.5, iteration {number set...}

xyz_grid-0019-1897848000-a cat as a DJ at the turntables xyz_grid-0020-1897848000-a headshot photographic portrait of a woman

Test3: base-fp32+custom#1-fp16, base-fp32+custom#2-fp16 @ weight: .5:.5, iteration {number set...}

xyz_grid-0024-1897848000-a cat as a DJ at the turntables xyz_grid-0025-1897848000-a headshot photographic portrait of a woman

Version Platform Description

Latest published version: e04867997e8903b9f44b75d073ef0be8c3159c12 2023-05-25T21:13:56Z

vladmandic commented 1 year ago

the algo currently implemented by model merger is near-trivial and there is no intelligence whatsoever. you can see it at https://github.com/vladmandic/automatic/blob/95242ca7d6e8da8dde4361fe78abfa9679d72d4e/modules/extras.py#L59

if you can do any quantifiable test of the new algo showing the results, i'd love to include this code as you've suggested.

DirtyHamster commented 1 year ago

I found the code for ours but I just couldn't figure out what the method called to really read more about it. lol... I thought perhaps it had a method name or something other than being "standard webui type of thing" which is kind of what I got back from all my searching.

Test wise what I'm thinking is 2 models and following them between with images between 2 similar merging routines. The issue I was having coming up with a good test. The methods are different so In the git re method there is alpha which I'm thinking currently would be similar to our multiplier m but then they also have iterations. Their entire paper only mentions iterations once, "Our experiments showed this algorithm to be fast in terms of both iterations necessary for convergence and wall-clock time, generally on the order of seconds to a few minutes."

Our current method just does the simplest merge it doesn't go for this ideal idea of convergence. I haven't tried to break their software yet so I don't know what the maximum iterations are yet. My base thought was doing 1 to 10 iterations on new method. In my own head I'd like to do the testing in a repeatable way that could potentially provide a default value for this iterations value as our models will probabilistically have more similarities than dissimilarities in general provided they are trained on the natural world.

On the standard merge I could do .1 stepping to 1 as in 10 steps On the git-rebasin I could do .1 stepping to 1 @ 1 iteration same via the steps Should this be followed up with say 10 iterations? As iterations isn't handled in the base concept I'm not sure how to position this?

I want to make sure that it's quantifiable too hence the excess of questions. Do you have any suggestion of publicly available models to use? Can we make a decision on x and y model for the testing. Base that is downloaded via vlad install (sd1.5) and something else as a secondary? It's hard to pick because there are a lot of small pruned models. I've had different experiences with merging pruned and unpruned modules. I was thinking of using this as they have used the manual method and have it listed in their notes: https://civitai.com/models/18207/ether-real-mix you see the in and out and alpha listed as a manual operation like 0,0.8,0.8,0.8,0.8,0.8,0,0,0,0,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0,0 these are the block weights they used in their merge. You count to the center number for the in and outs. They've been using a non default method but at least I know where their blocks are to some degree and it diverges from base enough style wise.

What would be satisfiable quantification for testing on this?

vladmandic commented 1 year ago

i like the proposed methodology, only thing i'd propose is to use two full fp32 model as base model and then create fp16 of it. so we'd have 3 models, each in 2 variations

and then run merges using different variations (total of 6 tests):

for custom model, how about https://civitai.com/models/4384/dreamshaper? that's a full 5.5gb fp32 model with baked in vae. and you can pick a second one.

regarding iterations - good question - i have no idea what the best setting would be? maybe try two extremes and see what the differences are and if they are significant, we can leave it as exposed setting in ui rather than predetermining it.

DirtyHamster commented 1 year ago

My suggested intention for iteration value was to leave it exposed but with a defaulted value given i.e. what ever looks best from the test results to be listed as the defaulted value. So if others decide to repeat the test that value for default could be averaged out for a best of value and still be easily changed. Honestly I'll try to break it and report on that too so we can try to eliminate some trouble shooting in the future if possible. I just want to avoid giving a value of 1 and having 1 not being a high enough of a value to make a significant change.

I agree on using fp32 models in the test too, I probably would have forgotten to include them if you didn't mention it. As their initial state of their code doesn't deal with safe tensors files I'll convert to checkpoint files first to avoid any hiccups. Dealing with the safe tensor files can be dealt with later.

I'll use https://github.com/arenasys/stable-diffusion-webui-model-toolkit to evaluate and check the models before after the merges to look for any changes or problems. Think about including this in the train tab it's really useful for figuring out what's going on component wise inside of the models if they don't run. I don't expect much to change in similar models but it's just good data for documentation.

This regiment is fine for me:

(Multiplier to be run from .1 to 1)

Re-git-basin side will be similarly mirrored:

(Multiplier to be run from .1 to 1)

I think for sampling method I'm going to stick with: DPM++ 2M Karras @ 32 steps and a CFG scale of 12.5 which are setting I know generally works for every model I've tried it on so far. This could be expanded on latter but as a first run I don't want to over complicate it too much.

I was thinking of using some of the textual inversion template prompts for the default text prompt. Something fairly basic: a photo of a [name], a rendering of a [name], a cropped photo of the [name], the photo of a [name], a photo of a clean [name], a photo of a dirty [name], a dark photo of the [name]. My own prompts end up being over complex so I'm trying to make sure it's something easy. I'll just replace [name] with woman or man

I was thinking of not using a negative prompt by default beyond maybe nfws or something similar. As not to effect image quality.

Using: https://civitai.com/models/4384/dreamshaper when I just checked it says it's Full Model fp16 (5.55 GB) jumping between versions. Version 5 inpainting is Full Model fp32 (5.28 GB) would that version be ok as it satisfies the fp32 issue?

I have a copy of the Hentai diffusion at fp32 it's similarly a larger unpruned model using the prompt a photo of a woman I get something like this generally out of it @1440x810: (I use this size intentionally looking for width duplication tiling in models and it's a high enough res that I can avoid using hires fix and or restore faces.)

(image snipped as not needed)

I had used this in another random merge test not documented trying to go between 2d illustrative to 3d realistic styles particularly dealing with faces using the standard method. So this might contrast well via base style. If you have a better suggestion as a second model feel free to suggest it I don't mind at all as the inclusion of the method is my end goal. My other thought was to look for a very realistic model and contrast it that way. I think the test would probably prove out regardless in the in-between though.

If we can square that off then I'll retype up the combined test method for the initial post and get started on testing.

vladmandic commented 1 year ago

Using: https://civitai.com/models/4384/dreamshaper when I just checked it says it's Full Model fp16 (5.55 GB)

i think that's likely a typo, i don't see how fp16 model can be that big.

regarding prompt and sampler - pick anything as long as the image is detailed enough so we can pick up if there is any loss of details.

everything else you wrote, no issues.

DirtyHamster commented 1 year ago

The HD model at fp32 is 7.5gb which is much larger and is the only reason I'm questioning the dreamshaper one. A fair number of my unpruned fp16 models are 4.4gb which isn't that far off size wise from 5.5gb... I'll look at this tonight when I move the model into their folders and start trying to work out a seed and prompt that does a fair close up face and shoulder for both models. I've only downloaded it so far. I'm hoping to use to same seed on both models so it will take me a few tries to find one that works. I'll get back to you on this before I start testing.

multiplier on the re-basin will be called alpha as that's what they use in the paper.

When I narrow down a seed to work with using the settings: DPM++ 2M Karras @ 32 steps and a CFG scale of 12.5 I want to just do a simple not on my computer check that you or anyone else available can get the same or similar image first. I'll do this for each model used in the test as a default base image. This should be the only external "compu time" help I need before going into the testing phase of the actual merging. If that doesn't work then we have to look at our settings to see if there is some interference going on before moving forward. This is just for repeatability of testing.

I'll go over some of this tonight and pick up on Tuesday as I have a busy day tomorrow. I'll append my original post with the method test and then provide the test data following in the comments similar to as this is posted. Maybe one or two more posts before testing I'll look into the dreamshaper issue and let you know what I find before hand. I think most of the other stuff we agree on which is good.

vladmandic commented 1 year ago

sounds good on all items. re: dreamshaper - as long as you use one fp32 model, i don't care which one it is, so don't spent too much time looking into dreamshaper.

DirtyHamster commented 1 year ago

Understood, I'll make sure I post links to both models used also. I'll try your suggestion first since you might have more experience using that one and can notice changes. I'm going to try to use 2 fp32 models and then bring them both down to fp16 just to be clear on that.

DirtyHamster commented 1 year ago

Using the [model toolkit] (https://github.com/arenasys/stable-diffusion-webui-model-toolkit)

Note this model has a correctable error: CLIP has incorrect positions, missing: 41. To clear the error run through model convert with the appropriate values. The error report is given first and then the fixed report after that.

Under basic for the "Model 1 used is: https://github.com/Delcos/Hentai-Diffusion fp32 7.5gb" as listed above I get the following report as default with the error:

Report (1676/1444/0102) Model is 7.17 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD.

Contains 3.20 GB of junk data! Wastes 1.99 GB on precision. CLIP has incorrect positions, missing: 41.

Model will be pruned to 3.97 GB. (note I'm not pruning this or making alterations right now)

Under advanced the report is as follows:

Report (1676/1444/0102) Statistics Total keys: 1831 (7.17 GB), Useless keys: 686 (3.20 GB).

Architecture SD-v1 UNET-v1 UNET-v1-SD VAE-v1 VAE-v1-SD CLIP-v1 CLIP-v1-SD Additional EMA-v1 EMA-UNET-v1 UNET-v1-EMA Rejected UNET-v1-Inpainting: Missing required keys (1 of 686) model.diffusion_model.input_blocks.0.0.weight (320, 9, 3, 3) UNET-v1-Pix2Pix: Missing required keys (1 of 686) model.diffusion_model.input_blocks.0.0.weight (320, 8, 3, 3) UNET-v1-Pix2Pix-EMA: Missing required keys (1 of 686) model_ema.diffusion_modelinput_blocks00weight (320, 8, 3, 3) UNET-v2-SD: Missing required keys (64 of 686) model.diffusion_model.output_blocks.4.1.proj_out.weight (1280, 1280) model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight (640, 1024) model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight (1280, 1024) model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight (640, 1024) model.diffusion_model.output_blocks.7.1.proj_out.weight (640, 640) … UNET-v2-Inpainting: Missing required keys (65 of 686) model.diffusion_model.output_blocks.4.1.proj_out.weight (1280, 1280) model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight (640, 1024) model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight (1280, 1024) model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight (640, 1024) model.diffusion_model.output_blocks.7.1.proj_out.weight (640, 640) … UNET-v2-Depth: Missing required keys (65 of 686) model.diffusion_model.output_blocks.4.1.proj_out.weight (1280, 1280) model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight (640, 1024) model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight (1280, 1024) model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight (640, 1024) model.diffusion_model.output_blocks.7.1.proj_out.weight (640, 640) … SD-v1-Pix2Pix: Missing required classes UNET-v1-Pix2Pix SD-v1-ControlNet: Missing required classes ControlNet-v1 SD-v2: Missing required classes CLIP-v2 UNET-v2 SD-v2-Depth: Missing required classes CLIP-v2 UNET-v2-Depth Depth-v2

With the error fixed the report comes out as the following:

Basic Report:

Report (1676/1444/0102) Model is 7.17 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD.

Contains 3.20 GB of junk data! Wastes 1.99 GB on precision. (no changes other than clip fix have been made)

Model will be pruned to 1.99 GB.

Statistics Total keys: 1831 (7.17 GB), Useless keys: 686 (3.20 GB).

Architecture SD-v1 UNET-v1 UNET-v1-SD VAE-v1 VAE-v1-SD CLIP-v1 CLIP-v1-SD Additional EMA-v1 EMA-UNET-v1 UNET-v1-EMA Rejected UNET-v1-Inpainting: Missing required keys (1 of 686) model.diffusion_model.input_blocks.0.0.weight (320, 9, 3, 3) UNET-v1-Pix2Pix: Missing required keys (1 of 686) model.diffusion_model.input_blocks.0.0.weight (320, 8, 3, 3) UNET-v1-Pix2Pix-EMA: Missing required keys (1 of 686) model_ema.diffusion_modelinput_blocks00weight (320, 8, 3, 3) UNET-v2-SD: Missing required keys (64 of 686) model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280) model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024) model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024) model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640) model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280) … UNET-v2-Inpainting: Missing required keys (65 of 686) model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280) model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024) model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024) model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640) model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280) … UNET-v2-Depth: Missing required keys (65 of 686) model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280) model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024) model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024) model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640) model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280) … SD-v1-Pix2Pix: Missing required classes UNET-v1-Pix2Pix SD-v1-ControlNet: Missing required classes ControlNet-v1 SD-v2: Missing required classes UNET-v2 CLIP-v2 SD-v2-Depth: Missing required classes CLIP-v2 UNET-v2-Depth Depth-v2

Next post will look at the suggested model 2 (noting I didn't see anything here that explicitly tells me if it's fp16 or 32 which I was hoping would be identified. If someone has some extra time I'd really like to know more about these missing keys and classes? (this can be dealt with later though.)

DirtyHamster commented 1 year ago

model 2:

Basic report:

Model is 5.55 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD.

Contains 1.60 GB of junk data! Wastes 1.97 GB on precision.

Uses the SD-v2 VAE.

Model will be pruned to 1.99 GB. (not altering the model at this point)

Advanced report:

Statistics Total keys: 1819 (5.55 GB), Useless keys: 686 (1.60 GB).

Architecture SD-v1 UNET-v1 UNET-v1-SD VAE-v1 VAE-v1-SD CLIP-v1 CLIP-v1-SD Additional EMA-v1 EMA-UNET-v1 UNET-v1-EMA Rejected UNET-v1-Inpainting: Missing required keys (1 of 686) model.diffusion_model.input_blocks.0.0.weight (320, 9, 3, 3) UNET-v1-Pix2Pix: Missing required keys (1 of 686) model.diffusion_model.input_blocks.0.0.weight (320, 8, 3, 3) UNET-v1-Pix2Pix-EMA: Missing required keys (1 of 686) model_ema.diffusion_modelinput_blocks00weight (320, 8, 3, 3) UNET-v2-SD: Missing required keys (64 of 686) model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280) model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024) model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024) model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640) model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280) … UNET-v2-Inpainting: Missing required keys (65 of 686) model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280) model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024) model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024) model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640) model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280) … UNET-v2-Depth: Missing required keys (65 of 686) model.diffusion_model.middle_block.1.proj_in.weight (1280, 1280) model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight (320, 1024) model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight (320, 1024) model.diffusion_model.output_blocks.7.1.proj_in.weight (640, 640) model.diffusion_model.output_blocks.5.1.proj_out.weight (1280, 1280) … SD-v1-Pix2Pix: Missing required classes UNET-v1-Pix2Pix SD-v1-ControlNet: Missing required classes ControlNet-v1 SD-v2: Missing required classes UNET-v2 CLIP-v2 SD-v2-Depth: Missing required classes CLIP-v2 UNET-v2-Depth Depth-v2

DirtyHamster commented 1 year ago

On model one I noticed in the report it states "CLIP has incorrect positions, missing: 41." I'm going to run this through model converter to see if clip fix will repair that. If that works and clears the error I'm fine with that. I'll just edit the above post on it with the repair instructions if not I'll pick a different model to use.

DirtyHamster commented 1 year ago

ok passing it through model converter seemed to work for clearing that error. I now have 2 saved copies for that model to work from. I'm editing my original model report appending to the end rather than removing the erroneously one.

Converting model... 100%|██████████████████████████████████████████████████████████████████████████| 1831/1831 [00:00<00:00, 915129.96it/s] fixed broken clip [41] Saving to \AI_Models\Stable Diffusion\HD-22-fp32-fixclip.ckpt... Saving to \AI_Models\Stable Diffusion\HD-22-fp32-fixclip.safetensors...

DirtyHamster commented 1 year ago

I'm going to convert the second model over to ckpt format and prep the data space (then try to clean up my mess) call it a night I'll pick up on Tuesday. (i.e. I have to mess around with the seeds before moving forward just generating lots of junk data..) I'll post the best of the seeds to pick from if I get to a good one just stop me so I can test on that I don't think I'll get through it tonight though. I think everything else is squared off for the test though.

If you spot anything I missed just point it out.

(combined from a direct post under it)

I ran a couple quick tests on prompts just to try to gear that in a little. Testing on this prompt on both models "For my own notes so I know where to pick up from":

sampler settings: DPM++ 2M Karras @ 32 steps and a CFG scale of 12.5 starting with seed 1280 resolution: 900x900: Clip 1:

Prompt: A head shot portrait photo of a woman Neg prompt: empty: "this will be empty unless I hit images that are nfws"

Seemed to be giving fairly consistent head and shoulders outputs

Last edit (Just cleaned this up a little adjusted prompt and resolution. Clip setting. Removed older images.)

vladmandic commented 1 year ago

rendering at natively high resolution is always problematic, especially with extreme aspect ratios. i find 720x640 or 640x720 the highest reliable value, anything higher should go through some kind of second pass (hires fix, ultimate upscale, multidiffusion upscaler, etc.).

brunogcar commented 1 year ago

not really sure if its actually relevant to what you are trying to achieve, but have you seen https://github.com/Xerxemi/sdweb-auto-MBW

DirtyHamster commented 1 year ago

@vladmandic I've played around a lot with different sizes the 1440x810 is about as large as I can go without too much getting too funky. Though I've gotten some fairly interesting conjoined body deformities too at the size it's normally ok for closeups though. I normally get a good laugh out of them. I'll probably cut the size in half for the test I just want to make sure there's enough detail without doing any post processing on them.

@brunogcar I had looked that one over initially too, in the auto mbw method you have to set the weights manually where as in the git-re-basin it's allowing and leveraging the computer to pick the closest similar values for the weights itself based on the data inside the model. While either method would still be an advancement over what the default method but it still wouldn't be attempting to do what git-re-base tires to achieve when merging.

So right now all were trying to achieve is generating a number of image sets outputted at known intervals so we compare the results of the default method vs the git-re-basin method to see if it preforms at the levels that they state.

Aptronymist commented 1 year ago

What about this? https://github.com/s1dlx/sd-webui-bayesian-merger

DirtyHamster commented 1 year ago

@Aptronymist that's closer to what's going to be tested but it uses a different method. I wouldn't mind trying that one later after the first test as it is similarly automated. I read a fair handful of methods they are all fairly interesting. Most of them haven't really published a look book of images to compare the methods across though. That's part of my interest in it just to see what's do able in a repeatable manner.

Most of the other ones that I read through can be found listed here: https://www.sdcompendium.com/doku.php?id=engineering_merge_7.3_0106

The standard block merge method has some good tests done and some fair explanations available but it's still kind of an unknown as to what concepts are exactly where in the blocks: https://rentry.co/BlockMergeExplained which is one of the reasons why I'm looking first at one of the automated alternatives. If you scroll down to look at their example images the size is so small that it makes it a little difficult to really examine what's going on. So I want to make sure the images are large enough be be useful as well.

I had done similar test on 2 dissimilar models (a 3d and a 2d style) previously doing a batch of consecutive mergings run at multiplier value .3 but I didn't document it well enough to publish it. It just gave me a idea that a standardized test would be useful for future evaluations. Some of what I noticed was odd distortions in the models as they were merged around where they started to look 2.5dish. So some of what I want to look for is if the concept merging happens faster and more precise than the standard merge, will those distortions happen again. Similarly if that will happen on the other merge method...

DirtyHamster commented 1 year ago

Tried passing a few models through tonight via the git-re-basin windows version and hit errors before each output. I'm going to try the two models I saw them use in their video ( https://www.youtube.com/watch?v=ePZzmMFb8Po ) as it could be an issue of the models in general or my own settings and I have to eliminate some of those possibilities. lol... I will probably also try one of the command line versions too as I saw more people commenting on getting that working. Have to go reread through the few repositories but will get to that soon.

Note on the command line version it looks like it runs through as many iterations as it takes to finish rather than being selectable as in the windows version. https://github.com/ogkalu2/Merge-Stable-Diffusion-models-without-distortion

These were the errors I got:

First attempt:

        ---------------------
            model_a:    HD-22-fp32-fixclip.ckpt
            model_b:    dreamshaper_6bakedvae_chk.ckpt
            output:     HD-22-fp32-fixclip_0.1_dreamshaper_6bakedvae_chk_0.9_1it_fp16.ckpt
            alpha:      0.1
            usefp16:    True  
            iterations: 1
        ---------------------

Using half precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.1

FINDING PERMUTATIONS P_bg337: -0.5 P_bg358: 0.25 <class 'KeyError'> 'model_ema.diffusion_modelinput_blocks00bias'

Program self-halted after error...

Second attempt: noticed I forgot to click off fp16 trying again. possibly since they were both fp32 it might have run into an issue there... Made sure I was running them as fp32.

        ---------------------
            model_a:    HD-22-fp32-fixclip.ckpt
            model_b:    dreamshaper_6bakedvae_chk.ckpt
            output:     HD-22-fp32-fixclip_0.1_dreamshaper_6bakedvae_chk_0.9_1it.ckpt
            alpha:      0.1
            usefp16:    False  
            iterations: 1
        ---------------------

Using full precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.1

FINDING PERMUTATIONS <class 'RuntimeError'> expected scalar type Float but found Half

Program self-halted after error...

3rd attempt try 2 known fp16s correctly pointing at them being fp16s...

        ---------------------
            model_a:    HD-22-fp16-fixclip.ckpt
            model_b:    dreamshaper_6bakedvae_chk_fp16.ckpt
            output:     HD-22-fp16-fixclip_0.1_dreamshaper_6bakedvae_chk_fp16_0.9_1it_fp16.ckpt
            alpha:      0.1
            usefp16:    True  
            iterations: 1
        ---------------------

Using half precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.1

FINDING PERMUTATIONS P_bg337: -1.0 P_bg358: 0.25 <class 'KeyError'> 'model_ema.diffusion_modelinput_blocks00bias'

Program self-halted after error...

vladmandic commented 1 year ago

<class 'KeyError'> 'model_ema.diffusion_modelinput_blocks00bias'

i've seen this mentioned at dreambooth repo and author stated that in most cases its due to oom?

DirtyHamster commented 1 year ago

My original searches really weren't pulling much just using:

"<class 'KeyError'> 'model_ema.diffusion_modelinput_blocks00bias'" site:github.com Or "'model_ema.diffusion_modelinput_blocks00bias'" site:github.com

with or without the quotes....

I saw some mention of oom with a few of them but others didn't mention it. I think I might have spotted that one from dreambooth too. They mention a KeyError: 'model_ema.diffusion_modelinput_blocks00bias' Though I'm not sure if the oom mentioned later in the thread has anything to do with first issue beyond being similar I got a error like that too type of chatter. Still reading though.

This could also be model related I decided to test our default merger and also tosses an error on trying to merge the two models:

22:33:53-750675 INFO Version: eaea88a4 Mon May 1 21:03:08 2023 -0400 22:33:55-306870 INFO Latest published version: 8f4bc4df08c48e70ea66e16c76151f1052f034c1 2023-05-31T16:44:35Z

Available models: H:\AI_Progs\AI_Models\Stable Diffusion 81 Loading H:\AI_Progs\AI_Models\Stable Diffusion\dreamshaper_6bakedvae_chk.ckpt... Loading weights: H:\AI_Progs\AI_Models\Stable Diffusion\dreamshaper_6bakedvae_chk.ckpt ━━━━━━ 0.0/6… -:--:-- GB Loading H:\AI_Progs\AI_Models\Stable Diffusion\HD-22-fp32-fixclip.ckpt... Loading weights: H:\AI_Progs\AI_Models\Stable Diffusion\HD-22-fp32-fixclip.ckpt ━━━━━━━━━━ 0.0/7.7 -:--:-- GB Merging... 100%|█████████████████████████████████████████████████████████████████████████████| 1831/1831 [00:14<00:00, 126.32it/s] Saving to \AI_Models\Stable Diffusion.5-ours-hd22-dreamshaper-fp32.ckpt... API error: POST: http://127.0.0.1:7860/internal/progress {'error': 'LocalProtocolError', 'detail': '', 'body': '', 'errors': "Can't send data when our state is ERROR"} HTTP API: LocalProtocolError ╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮ │ H:\AI_Progs\VladDiffusion\automatic\venv\lib\site-packages\starlette\middleware\errors.py:162 in │ │ call │ │ │ │ H:\AI_Progs\VladDiffusion\automatic\venv\lib\site-packages\starlette\middleware\base.py:109 in call │ │ │ │ ... 7 frames hidden ... │ │ │ │ H:\AI_Progs\VladDiffusion\automatic\venv\lib\site-packages\h11_connection.py:512 in send │ │ │ │ 511 │ │ """ │ │ ❱ 512 │ │ data_list = self.send_with_data_passthrough(event) │ │ 513 │ │ if data_list is None: │ │ │ │ H:\AI_Progs\VladDiffusion\automatic\venv\lib\site-packages\h11_connection.py:527 in │ │ send_with_data_passthrough │ │ │ │ 526 │ │ if self.our_state is ERROR: │ │ ❱ 527 │ │ │ raise LocalProtocolError("Can't send data when our state is ERROR") │ │ 528 │ │ try: │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ LocalProtocolError: Can't send data when our state is ERROR Checkpoint not found: None Available models: H:\AI_Progs\AI_Models\Stable Diffusion 81 Checkpoint saved to H:\AI_Progs\AI_Models\Stable Diffusion.5-ours-hd22-dreamshaper-fp32.ckpt.

It did give me an output for a model, however it does not see the model even after restarting the server. :D I did this twice same error. Normally I don't have any issues with merging you can see how many models I have lol...

Redownloading the models just to make sure there weren't download issues involved. Will run again as a double check.

redownloaded the dreamshaper 5 fp32 re-ran the merge on ours I still got the same error at the end but it at least gave me back a loadable model this time.

(snipped for duplicate error post)

Ok so I reran the gui version 5 and got a different error after the redownload. This error I know I saw listed at least. Some one mentioned this could be caused by stable diffusion base model differences stating (SD 1.5 and SD 2.1 you get this error). I don't see that in the model data though:

dreamshaper: Model is 6.88 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD. HD 22: Model is 7.17 GB. Multiple model types identified: SD-v1, EMA-v1. Model type SD-v1 will be used. Model components are: UNET-v1-SD, VAE-v1-SD, CLIP-v1-SD.

        ---------------------
            model_a:    HD-22-fp32-fixclip.ckpt
            model_b:    dreamshaper_5BakedVae_fp32.ckpt
            output:     HD-22-fp32-fixclip_0.3_dreamshaper_5BakedVae_fp32_0.7_4it.ckpt
            alpha:      0.3
            usefp16:    False  
            iterations: 4
        ---------------------

<class 'KeyError'> 'state_dict'

I'll keep trying later. I'm pretty sure I saw that error mentioned else where in the reps.

vladmandic commented 1 year ago

LocalProtocolError: Can't send data when our state is ERROR

this happens when network socket gets closed by someone (os, router, whatever) since it was idle for too long. silly gradio does not know how to handle keepalive correctly - they're still fumbling around with that. but since it happens at the very end when server is trying to tell browser its done, you should already have saved model by then.

class 'KeyError'> 'state_dict'

this seems pretty fundamental as state_dict is pretty much the core of the sd model.

DirtyHamster commented 1 year ago

Ok good to know on ours for the next large size merge batch I try. I just found it odd that I got an unusable model the first time.

On the: class 'KeyError'> 'state_dict' I spotted the same error listed in this discussion for them: https://github.com/diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui/discussions/11 There isn't a lot of commentary going on in their discussions section though for the gui.

I noticed in the comments in another related repo that some models don't have a given stat_dict and that it's normally just skipped as a layer and then they go directly to the keys when that happens. So it might just be an issue of the gui implementation of it. Spotted that here: https://github.com/ogkalu2/Merge-Stable-Diffusion-models-without-distortion/issues/31 Not really sure yet as the difference between the model versions I tried weren't from entirely different concept or even makers. I have some other thoughts on this further down:

Lost my source link for this: some models don't work as an "A" input, but will work as a "B" input. causing <class 'KeyError'> 'model_ema.decay'

On a correct run it looks like the output should be similar to this: https://github.com/ogkalu2/Merge-Stable-Diffusion-models-without-distortion/issues/24

OOM type of crash looked more like this: https://github.com/ogkalu2/Merge-Stable-Diffusion-models-without-distortion/issues/14 . I was also looking through some of the initial chatter from when ogkalu2 was working on the PermutationSpecs https://github.com/samuela/git-re-basin/issues/5 "The test i have running has 2 dreambooth models pruned to 2GB. The bigger the size of the models, the higher the RAM usage. I didn't realize 4gb models were too much for 32 GB ram systems currently. The problem is the linear sum assignment. It can only run on the CPU" Apparently someone kept OMMing at 32GB @ half the model size I was trying to push through. Since that's the same amount I'm running on that's entirely a strong possibility with the full models that I was trying to start with even though the errors were different. In the GUI that difference in error could be where the runs were in the process when the OOM occurred potentially. The GUI doesn't have much documentation.

There are some suggestions of removing ema's first so the baked in vae could also be causing issues. https://github.com/ogkalu2/Merge-Stable-Diffusion-models-without-distortion/issues/19

I dl'd a different full fp32 model last night just incase some the issues might be caused via the HD model. So I'm going to try the A and B swap and different models and a few other odds and ends such as pruning before moving re-trying on the CL version. I do kind of get the feeling while reading through that it's not far enough along yet to be added in without more dev work on it just from the number of errors being asked about. Which could be why there are unanswered responses to getting extensions for it. I am really curious if turns out to just be a pruning issue as that seems fairly rational. If that's the case I can still do a partial test for it until it's better optimized if it can be optimized as is. Still have a fair bit more to read over on it too.

If that doesn't work I'll pause to do the initial look book for merge comparisons using ours as I'd like to get that posted and then check out Aptronymist's suggestion of trying the other automated merges: https://github.com/s1dlx/sd-webui-bayesian-merger . Along with brunogcar's suggestion: https://github.com/Xerxemi/sdweb-auto-MBW though that has an open issue for the fork version for here: https://github.com/Xerxemi/sdweb-auto-MBW/issues/19 The UI differences between what we already have and this is a lot larger. Both of these do handle a fair amount of concepts our default method does not do. I can always re-check back in on the re git basin method as I go.

I'm going to clean up some of my posts to the tread too just to keep it more readable in the next few days and try some of the other stuff I mentioned in regard to pruning and other sizes... I think I'm going to fix the title as as well to something like: Potential improvements for model merge & merge tests - git-re-basin and others. When I get done cleaning it up.

Note - looks like to get to pruned states of 2gb stated as working it has to be fp16, as fp32 pruning would be 4gb.

Got up to:
pruned no ema version on hd22_fp16 extracted a copy of dreamshaper5's baked in vae and pruned version of no ema version fp16 Will attempt the merge later.

With pruned files: Run 1:

        ---------------------
            model_a:    HD-22-fp16-fixclip-prunded-no-ema.ckpt
            model_b:    dreamshaper_fp16_novae_no_ema_pruned.ckpt
            output:     HD-22-fp16-fixclip-prunded-no-ema_0.1_dreamshaper_fp16_novae_no_ema_pruned_0.9_1it_fp16.ckpt
            alpha:      0.1
            usefp16:    True  
            iterations: 10
        ---------------------

Using half precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.01

FINDING PERMUTATIONS P_bg324: 0.5 <class 'KeyError'> 'first_stage_model.encoder.norm_out.weight'

With pruned files: run 2 switching A & B

        ---------------------
            model_a:    dreamshaper_fp16_novae_no_ema_pruned.ckpt
            model_b:    HD-22-fp16-fixclip-prunded-no-ema.ckpt
            output:     dreamshaper_fp16_novae_no_ema_pruned_0.1_HD-22-fp16-fixclip-prunded-no-ema_0.9_10it_fp16.ckpt
            alpha:      0.1
            usefp16:    True  
            iterations: 10
        ---------------------

Using half precision

            ---------------------
                ITERATION 1
            ---------------------

new alpha = 0.01

<class 'KeyError'> 'first_stage_model.decoder.conv_in.bias'

This second test didn't even calc before spiting out an error. I ran this again after restarting the gui just to be sure.

Note - With the smaller file size I'm not thinking this an indication of OOM. To check I reran both tests watching vram in task manager, no indication of OOM so I can rule that out atleast. It's interesting that I'm getting so many different errors. I'm going to try swapping out the models next.

DirtyHamster commented 1 year ago

Counterfeit-V3.0_full_fp32 is the next one I'm going to try with this I'll prune it down to no vae, no ema and try the method against both the above models already tried. I might do one more attempt after that before gathering and filing out error reports and trying the cli version. I've been thinking about this all weekend.

s1dlx commented 1 year ago

possibly you could be interested in https://github.com/s1dlx/meh where we have a re-basin implementation running in this branch we optimised for low VRAM use https://github.com/s1dlx/meh/pull/15

DirtyHamster commented 1 year ago

@s1dlx thanks I'll take a look at it. The one I've been trying has been throwing nothing but errors at me.

DirtyHamster commented 1 year ago

@vladmandic I finished the first batch of merges for full weighted fp32's Model 1 at 1 will be the unmerged version and model 2 at 1 will be also the unmerged version the rest of the spread of merges go from .1 to .9 as weighted sum interpolation method. Images below.

I was looking over the repo that @s1dlx posted a few messages above and saw that they have 3 fp16 models selected out for their tests. So when I get down to the fp16's I'll include the 2 models that they used for the weighted sum interpolation method in our tests with the same methodology. I really dig the weighted subtraction and the multiply difference methods in their examples: https://github.com/s1dlx/meh/wiki @s1dlx are you going to post examples of your implementation of re-basin in your wiki?

I have to admit I like their prompt so I'm going to use that as a secondary to the one for the woman so the tests can match up better further along.

a headshot photographic portrait of a woman a cat as a DJ at the turntables Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1897848000, Size: 512x716

Test 1: Model 1: HD22_clipfixed_fp32 Model 2: DreamShaper_5 BakedVae_fp32 Vae Used: Extracted version of DreamShaper_5's

Images snipped for readability (regenerating images as grids).

vladmandic commented 1 year ago

i'd consider this a success. can you give me some metrics on the actual merge process? memory requirements, duration, etc...

s1dlx commented 1 year ago

@DirtyHamster yes, rebasin examples will follow

DirtyHamster commented 1 year ago

Finished up doing the first batch merges for the second set. I previously extracted the Vae from the fp32 model dreamshaper to use it in the tests. So keeping with this I first converted both models down to fp16 and then extracted the Vae. Out of curiosity I generated a single image using the "a cat as a DJ at the turntables" for each of the extracted Vae(s) versions. As well as a well as doing a single 0.5 merge for A: GhostMix civitai X B: Lyriel civitai at 0.5M thinking this could be directly compared to the wiki test image here: https://github.com/s1dlx/meh/wiki

dreamshaper_fp16 at .5 with the fp32 extracted baked Vae (much closer to test 1) 02902-1897848000-a cat as a DJ at the turntables dreamshaper_fp16 at .5 with the fp16 extracted baked Vae 02901-1897848000-a cat as a DJ at the turntables

lyriel_fp16 at .5 with it's fp16 extracted baked Vae from ghostmix (note - got a different image from what's on the wiki) 02907-1897848000-a cat as a DJ at the turntables without the baked Vae: 02908-1897848000-a cat as a DJ at the turntables

There could be a few different reasons why this is generating a different image I'm not using xformers, different model versions perhaps, different math method for weighted sum merge? Our is listed as: A (1 - M) + B M and s1dlx's is said to be image

@s1dlx to clear up the question of model difference I'm using: GhostMix v2.0-baked Vae and Lyriel v1.6 which ones are you using in your tests? Are Xformers on or anything else that might cause a non-repeatable image? Just have to check. It probably just comes down to the math. I think the metrics question 2 msgs up was for you as well.

@vladmandic What do you think of this? I wanted to check at first before running 8 more merges on the extra 2 models.

A little extra experiment: I then attempted to bake the dreamshaper_fp32 Vae into the .5 dreamshaper_fp16 model set the Vae to none and reran the same prompt: (this results in an image much closer to using fp32 Vae externally but with some slight differences. The resulting model size is the same with fp32 Vae baked in as without with noticeable quality difference. 02903-1897848000-a cat as a DJ at the turntables

From here I got more curious and instead of converting the individual models down to fp16 and then merging them I converted an existing .5 merge directly from fp32 to fp16. This resulted in a model size that was larger but short of double the size of merging the individually converted ones. 3,761,511KB vs 5,823,497KB. Image is closer to the fp32 merge using the correct exported vae 32 without using a vae yet still different. 02905-1897848000-a cat as a DJ at the turntables Applying the correct extracted fp32 Vae to this fp32tofp16 conversion also nets a slightly different image: 02906-1897848000-a cat as a DJ at the turntables

I would have thought these last 2 convert and merges would have worked out to result in the same size models but there's a fair size difference.

vladmandic commented 1 year ago

regarding size, you're saying that 0.5 + 0.5 < (1 + 1) / 2 :) and we're not even talking about quaternions, those are supposed to be normal rational numbers :)

regarding quality, first big surprise is how much better model-fp16 + vae-fp32 looks then model-fp16 + vae-fp16 ! ok, maybe i shouldn't be surprised since most of NaN issues are due to rounded-down-to-zero and fp32 has far more mantissa bits and vaes in general are low-weight networks.

second surprise is difference in mixing results from wiki page. no, this is too much to be xformers (although i don't think they play here at all). far more likely due to completely different math - don't know which formulas are actually applied, but quoted formulas are definitely different.

btw, a bit off-topic, you can use cli/image-grid.py --horizontal to create nice horizontal grids to compare stuff instead of pasting image-by-image, it would be cleaner to see the differences.

DirtyHamster commented 1 year ago

@vladmandic Regarding size I just found it rather quirky. I expected there might be some difference but not as much as there was. I was staring at them in file explorer going wait, what, why? :D I thought maybe it might relate to prune-able data not being treated the same but I'm kind of still scratching my head over it.

image

What kind of got me with the model16 + vae-fp32 was when I baked it in, I thought it would degrade down in quality closer to the fp16 vae but it was still higher than the fp16 vae. or have some file size difference as the vae at fp32 is 2x the size like it should be.

Hadn't thought too much about doing a grid other then how large it would be with 10 images. Half the time I see them displayed they have no link to the full image so they can get hard to see. I was just opening the image link up in a second window and scrolling up to the desired image to do the compare. I'll generate the next set in the grid and then you can decide if I should redo the others, I don't mind.

These are the two main ones from my last post: (Have to do the other prompt still.)

HD22Xdreamshaper-model_fp16_vae_fp16 xyz_grid-0000-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp16

HD22Xdreamshaper-model_fp16_vae_fp32 xyz_grid-0001-1897848000-a cat as a DJ at the turntables H22xdreamshaper-model_fp16_vae_fp32

Let me know if you like them better as the grid?

vladmandic commented 1 year ago

Regarding size I just found it rather quirky

i'd say far more than rather quirky - makes me question what the hell is going on there? but its undeniable that a) fp16 is sufficient for a model, b) fp32 vae is far better. even if everything else is a fail (which its not), that is a really good piece of information.

and a bit off-topic, i understand in early days that models did not have vae baked in, but nowadays any newly released model without baked in vae? i just don't want to bother :)

Let me know if you like them better as the grid?

oh, definitely, this makes it very easy to visualize both differences (e.g. by going back and forth on fp16 vs fp32) and compare the progress of merge itself.

you can decide if I should redo the others, I don't mind.

no need for that :)

btw, totally off-topic, since you've already mentioned lyriel model before (and that's one of my favorites, i'd be curious (and this is just personal) how does it merge with another something like https://civitai.com/models/18798/meinaunreal

but back on-topic - how fast is this merge compared to basic method and what are you seeing regarding memory requirements?

why? i'm being tempted to perhaps do this purely in-memory - have sdnext load both models and then have a slider for merge to be used in near-real time without actually saving/loading merged model.

ljleb commented 1 year ago

There could be a few different reasons why this is generating a different image I'm not using xformers, different model versions perhaps, different math method for weighted sum merge?

I am also getting different images from both the wiki and your results, using sd-meh to merge. It may be related to hardware or driver version. I ran the weighted sum merge on an RTX 3080.

s1dlx commented 1 year ago

The models I’m using have the following hashes

lyren.safetensors [ec6f68ea63]
Realistic_Vision_V1.4.safetensors [21c6d51e3e]
ghostmix.safetensors [d7465e52e1]

And I run PyTorch 1.x with cuda 10.2 and 4xx drivers on a 1080ti card

DirtyHamster commented 1 year ago

@vladmandic The fp32 vae bit I think is really useful it kind of has me wanting to back sort through my models to export out the all the baked vae's for testing with other smaller models. For the most part yeah their are tons of the original models that don't have them. I mess with them pretty much just to get a better feel for where a lot of the newer merges got started and play with some back training.

I'll do a batch for lyriel x mein later I have both downloaded already.

For s1dlx's: I just got past the last error this morning to get it working. Luckily, it unlike the others; I could see the backtrace for missing components and other CLI complaints. So I had some path issues and some components that needed to fixing not sure why I had path errors though. Using it via a gui helper https://github.com/axsddlr/sd-gui-meh I think most of the errors were probably coming from getting this in. I don't see the re-basin option in the helper drop down window though but I can see it's not listed in their settings.py file. Might just have to ask over there to see if they can add it in.

On my first run with it I did his weight sum interpolation with the different calcs on the HD22_fp16 model, RAM requirements seemed to match up to 2x model size. so for 3.7gb models it was around 7gb memory at the highest. I didn't see VRAM move at all while running it. Second run using the larger HD22_fp32 model roughly 8gb it jumped to 23gb RAM usage, no VRAM usage, runtime was a 9m39s for the larger merge. Time wise I don't see that as being bad for the model size. I didn't think about timing the shorter run but I can always rerun it.

I think putting it all into memory is a great idea so long as it's an option, I only have 32gb at the moment so with everything else I run in the background natively it would be a bit of a squeeze. If a user had 64gb or higher I don't think there would be any issues doing it that way.

@ljleb @s1dlx ok so we have different hashes on ghostmix so that might be a different version. The hash from mine is e3edb8a26f, I'm going to investigate that first via grabbing each version and try to see if that can fix that issue.

DirtyHamster commented 1 year ago

@s1dlx The Ghostmix you have is version 1.2 this could be where the difference is coming from. I'm going to redo the merge and see if that fixes it. https://civitai.com/models/36520?modelVersionId=59685

vladmandic commented 1 year ago

btw, there are significant differences in torch 2.x and torch 1.x as well, so i wouldn't be surprised if those add up.

DirtyHamster commented 1 year ago

@vladmandic Ok it worked it's not the math.

02936-1897848000-a cat as a DJ at the turntables

DirtyHamster commented 1 year ago

@vladmandic From my first run on s1dlx's vs yours (1st and 3 are from his, 2 and 4th are ours.) On a side note sitting here laughing at myself just noticed the funniest thing as I sat here thinking the clip moved from 1 to 4 unexpectedly. I had apparently bumped the clip to 4 and didn't notice while running off all the other images. I had restarted the computer this morning while fixing the errors holding up the other side of testing so my settings were consistent till then. That caused the image difference on his too. @s1dlx sorry about that confusion.

I was like what's with that strange cat this isn't right... lol... Quality wise I don't see any issues.

Clip 1: xyz_grid-0002-1897848000-a cat as a DJ at the turntables

Clip 4: xyz_grid-0003-1897848000-a cat as a DJ at the turntables

I posted a note up to the gui for meh to see if they can add in the few missing lines just waiting on that. So I can export the re-basin merges

s1dlx commented 1 year ago

@DirtyHamster you can do all the tests from sd-meh cli Or am I missing some use case?

DirtyHamster commented 1 year ago

@s1dlx I kind would like a small gui client I could run on the side, so it's beneficial to me to ask them regardless. Earlier I figured cli would be last resort as up till this morning I hadn't gotten far enough along anyway.

DirtyHamster commented 1 year ago

@s1dlx So I gave just using the cli a shot though I got some errors after stage 2. Thought it kind of makes me laugh because I've gotten so many of them so far.

H:\Users\adamf\AI_Progs\sd-meh-merge\meh>merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\HD-22-fp16-fixclip.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\dreamshaper_5BakedVae_fp16.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test\merge01_1.safetensors -f safetensors -ba .1 -bb .9 -rb -rbi 1

loading: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\HD-22-fp16-fixclip.safetensors loading: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\dreamshaper_5BakedVae_fp16.safetensors permuting 0 stage 1: 100%|█████████████████████████████████████████████████████████████████████| 1831/1831 [02:23<00:00, 12.77it/s] stage 2: 100%|████████████████████████████████████████████████████████████████| 1819/1819 [00:00<00:00, 1818264.77it/s] Traceback (most recent call last): File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 94, in main() File "C:\Users\adamf\AppData\Roaming\Python\Python311\site-packages\click\core.py", line 1130, in call return self.main(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adamf\AppData\Roaming\Python\Python311\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "C:\Users\adamf\AppData\Roaming\Python\Python311\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\adamf\AppData\Roaming\Python\Python311\site-packages\click\core.py", line 760, in invoke return __callback(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 79, in main merged = merge_models( ^^^^^^^^^^^^^ File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 89, in merge_models return rebasin_merge( ^^^^^^^^^^^^^^ File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 176, in rebasin_merge thetas["model_a"] = apply_permutation(perm_spec, perm_1, thetas["model_a"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\rebasin.py", line 2192, in apply_permutation return {k: get_permuted_param(ps, perm, k, params) for k in params.keys()} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\rebasin.py", line 2192, in return {k: get_permuted_param(ps, perm, k, params) for k in params.keys()} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\rebasin.py", line 2178, in get_permuted_param for axis, p in enumerate(ps.axes_to_perm[k]):


KeyError: 'model_ema.diffusion_modelinput_blocks00bias'
s1dlx commented 1 year ago

@DirtyHamster looks like one of the models is missing a key…can you check whether you have a broken one?

DirtyHamster commented 1 year ago

@s1dlx ok I think I got it it has to be pruned no-ema it's not that it's missing something.

DirtyHamster commented 1 year ago

@vladmandic ok so I got it working it has to be a 0.5:0.5 merge at first I didn't think about this but quickly realized my mistake thinking that maybe I could weight it slightly towards model a or b. This causes a funky error where it will just output model a for iteration 1 and then do a small permutation of model b for every iteration after. If this goes in there will probably need to be some type of a check when the models are loaded to make sure the models are pruned otherwise the user will get an error festival like I had, perhaps just disable the merge button with a warning or something similar if it's not pruned. Similarly the weights could be preset at .5 when the re-basin method is selected as activated too.

pf16: Test

xyz_grid-0016-1897848000-a cat as a DJ at the turntables

xyz_grid-0018-1897848000-a headshot photographic portrait of a woman

When I was looking at the test images it looks like it could probably go further iteration value wise the set of the woman was still picking up more details in her clothing. Similarly the Cat one in the last few iterations looked like it was about to change the arm stripes a bit. I'd like to try going past 10 iterations too I might try doing that in increments of 5s. What I'd hope to find is where 2 increments of 5 iterations don't show a new premutation or an increase in details as it's hard to tell where it should stop. I actually wonder if it could be made to auto save a version every so many increments? Similar to how you monitor and start and stop from a training point. That would make it a lot easier to get to higher iteration values.

Highest RAM usage I saw was around a peak of 21GB however I'll get a better reading later as I have a number of odds and ends open so my best guess is it was probably using around 12GB of that 21 based on what I'm sitting at right now. I'll also do a time test but 1 to 3 iterations was really pretty fast and 10 didn't feel much longer then when I do larger models currently. So it's potentially possible you might be able to run this method for pf16 right from RAM, I'd still want an option though.

pf32 might just be possible with 32gb ram but I want to try that after a fresh restart. I'll probably get to that tomorrow or the day after. I have to prep the files for it and a few odds and ends.

s1dlx commented 1 year ago

@DirtyHamster re-basin needs a huge amount of VRAM (it keeps at least one copy of model_a in memory)

therefore in sd-meh we implemented the --prune mode where we prune out all the extra stuff we don't need for re-basin it also casts everything to fp16 for extra compression. Once re-basin is run, we re-inflate to fp32 and add back the bits we removed so that the final model size doesn't change

This way we manage to do re-basin with add_difference on gpu with 12GB of VRAM

give that flag a try, basically --prune --p 16 --rb --rbi <rebasin iterations>

DirtyHamster commented 1 year ago

@s1dlx I noticed the chatter over VRAM usage when reading through the various implementations while looking up the various errors I was getting. I spotted the -prune option but just kind of figured at the time it was like a standard prune. So if I do -prune -p 32 -rb -rbi will it compress to fp16 and then output at fp32? I just want to make sure I'm getting that right?

s1dlx commented 1 year ago

@s1dlx I noticed the chatter over VRAM usage when reading through the various implementations while looking up the various errors I was getting. I spotted the -prune option but just kind of figured at the time it was like a standard prune. So if I do -prune -p 32 -rb -rbi will it compress to fp16 and then output at fp32? I just want to make sure I'm getting that right?

Yes correct that’s the intended effect

the VRAM logging is there to check if models fit

DirtyHamster commented 1 year ago

@s1dlx Ok good to know I'll get started on the other tests. Side note - "I talked to the person doing the standalone gui for meh, he's going to take a look at it in July if you want to leave him any notes on putting it in now's a great chance to do it. https://github.com/axsddlr/sd-gui-meh/issues/6 . If you don't want to I can pass on info to him that's fine also."

Other questions I have..

  1. Just a small follow up on the -prune, If I put in a full 8gb trainable state model will it trim out the ema? I'm just curious as some think of the no-ema as being a full model? Does the ema go back in as it's kind of invalid training data after a merge? i.e. does a cli with -prune -p 32 -rb -rbi comeout output as a pf32 no-ema prune? I wasn't a clear as I should have been when asking before.

  2. Do you have any good recommendations on max iteration values to run for this. I'm planning on doing some testing on this post the next two steps phases. I was thinking about increasing the value by increments of 5. Have you found anything that might create a degradation of the model past x iterations that I should be aware of while testing?

  3. 3 posts above I was thinking about if it was possible to start and stop a re-basin merge with a incremental save output for quality monitoring? From knowing the code to do an implementation of it do you think something like that would be possible? I can just see issues setting the iteration value to 100 and then having to use to system for something else midway through. If I'm forced to stop at iteration 45 there is no closest output to save from and I'd lose all the work the computer did to that point prior. I have to ask as I think ahead on stuff like this.

Thanks again, you've made the testing a lot easier.