s1dlx / meh

Merging Execution Helper
MIT License
36 stars 10 forks source link

Error - I think dealing with models that have instruct pix2pix #38

Open DirtyHamster opened 1 year ago

DirtyHamster commented 1 year ago

@s1dlx Was attempting to do a re-basin merge with the following model: photosomnia_v16.instruct-pix2pix.safetensors The only thing different with this model is that it has pix2pix in the architecture, so I was thinking that might be the cause of the issue. I was doing one of my follow up test. I have to check inpainting too.

CLI prompt: H:\Users\adamf\AI_Progs\sd-meh-merge\meh>merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\test\00-regit-pwddxWDP1x5EV5SMA5URPM440EMxdream6rellyrevebmev2_dblback2_20prune.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors -m weighted_sum -p 32 -o H:\Users\adamf\AI_Progs\AI_Models\test\00-regit-pwddxWDP1x5EV5SMA5URPM440EMxdream6rellyrevebmev2_dblback2pix_20prune -f safetensors -ba 0.5 -bb 0.5 -pr -rb -rbi 20

I get the following error:

before loading models: 0.000 loading: H:\Users\adamf\AI_Progs\AI_Models\test\00-regit-pwddxWDP1x5EV5SMA5URPM440EMxdream6rellyrevebmev2_dblback2_20prune.safetensors loading: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors models loaded: 0.000 permuting 0 iteration start: 0.000 weights & bases, before simple merge: 0.000 stage 1: 100%|████████████████████████████████████████████████████████████████████▉| 1130/1131 [00:42<00:00, 26.46it/s] Traceback (most recent call last): File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 151, in main() File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1157, in call return self.main(args, kwargs) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 783, in invoke return __callback(args, kwargs) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 132, in main merged = merge_models( File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 146, in merge_models merged = rebasin_merge( File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 286, in rebasin_merge thetas["model_a"] = simple_merge( File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 244, in simple_merge res.result() File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 451, in result return self.get_result() File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 403, in get_result raise self._exception File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 342, in simple_merge_key with merge_key_context(key, thetas, args, kwargs) as result: File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 135, in enter return next(self.gen) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 428, in merge_key_context result = merge_key(args, kwargs) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 403, in merge_key merged_key = merge_method(merge_args).to(storage_device) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge_methods.py", line 27, in weighted_sum return (1 - alpha) a + alpha * b RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 1

s1dlx commented 1 year ago

I'm not familiar with pix2pix architecture, but it seems that you have a layer with a different size from the other model one, hence it's not possible to merge those

also, you are rebasin the two, which may complicate further the situation

try to weight_sum merge the two, do you get the same error?

DirtyHamster commented 1 year ago

Same error when just dropping the rebasin args and having it run as a normal weight_sum. I'm not that familiar with the architecture of it either.

https://huggingface.co/blog/instruction-tuning-sd https://www.tensorflow.org/tutorials/generative/pix2pix

I'm actually curious if the layer isn't present in both models then perhaps it could just be pruned and returned later, so it can retain the functionality.

H:\Users\adamf\AI_Progs\sd-meh-merge\meh>merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\test\00-regit-pwddxWDP1x5EV5SMA5URPM440EMxdream6rellyrevebm_10prune.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors -m weighted_sum -p 32 -o H:\Users\adamf\AI_Progs\AI_Models\test\testweightsum -f safetensors -ba 0.5 -bb 0.5 -pr

before loading models: 0.000 loading: H:\Users\adamf\AI_Progs\AI_Models\test\00-regit-pwddxWDP1x5EV5SMA5URPM440EMxdream6rellyrevebm_10prune.safetensors loading: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors models loaded: 0.000 stage 1: 100%|████████████████████████████████████████████████████████████████████▉| 1130/1131 [01:30<00:00, 12.54it/s] Traceback (most recent call last): File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 151, in main() File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1157, in call return self.main(args, kwargs) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 783, in invoke return __callback(args, kwargs) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 132, in main merged = merge_models( File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 162, in merge_models merged = simple_merge( File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 244, in simple_merge res.result() File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 451, in result return self.get_result() File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 403, in get_result raise self._exception File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 58, in run result = self.fn(*self.args, *self.kwargs) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 342, in simple_merge_key with merge_key_context(key, thetas, args, kwargs) as result: File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 135, in enter return next(self.gen) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 428, in merge_key_context result = merge_key(args, kwargs) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 403, in merge_key merged_key = merge_method(merge_args).to(storage_device) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge_methods.py", line 27, in weighted_sum return (1 - alpha) a + alpha * b RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 1

DirtyHamster commented 1 year ago

@s1dlx I tried running it on the default merger via Vlad's and got the following error on his: RuntimeError: When merging instruct-pix2pix model with a normal one, A must be the instruct-pix2pix model.

I tried swapping A and B on yours but got the same error as in the previous message. Swapping them did work on vlad's so there might be some code in there that you can look at to see how it deals with the architecture for pix2pix at least for the normal merge.

Full read out:

23:51:36-222934 INFO Model merge loading secondary model: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors Loading weights: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors ━ 0… -… GB 23:51:37-917656 INFO Model merge loading primary model: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\HD-22-fp16-fixclip-noema.safetensors Loading weights: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\HD-22-fp16-fixclip-noema.safetensors ━━━━ 0.0… -:--… GB 23:51:41-717749 INFO Model merge: running 39%|███████████████████████████████ | 450/1143 [00:09<00:14, 47.69it/s] 23:51:51-155872 ERROR model merge: RuntimeError ╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮ │ H:\Users\adamf\AI_Progs\VladDiffusion\automatic\modules\ui_models.py:100 in modelmerger │ │ │ │ 99 │ │ │ │ │ try: │ │ ❱ 100 │ │ │ │ │ │ results = extras.run_modelmerger(*args) │ │ 101 │ │ │ │ │ except Exception as e: │ │ │ │ H:\Users\adamf\AI_Progs\VladDiffusion\automatic\modules\extras.py:154 in run_modelmerger │ │ │ │ 153 │ │ │ │ if a.shape[1] == 4 and b.shape[1] == 8: │ │ ❱ 154 │ │ │ │ │ raise RuntimeError("When merging instruct-pix2pix model with a norma │ │ 155 │ │ │ │ if a.shape[1] == 8 and b.shape[1] == 4:#If we have an Instruct-Pix2Pix m │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: When merging instruct-pix2pix model with a normal one, A must be the instruct-pix2pix model. 23:51:58-233415 INFO Available models: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion 117

s1dlx commented 1 year ago

just to clarify: using pix2pix as model a doesn't work on both vlad and meh?

WhiteWipe commented 1 year ago

That extra layer is trained along with the rest of the weights. If this is anything like inpainting, just adding the layer to a normal model works, but causes a drop in quality. I am really curios if there is a way when merging to modify the weights to the extra layer relative to model that's being merged with inapinting/pix2pix model to improve quality. A question I can't answer: Can git rebasin be repurposed for this use case where the normal sd weights are merged normally, but the entire inpainting/pix2pix model is permutated towards the normal model separately and then add at the end just the permutated layer to the merge.

Edit: Git rebasin can presume the normal sd model has the extra layer, but with weights zero.

DirtyHamster commented 1 year ago

just to clarify: using pix2pix as model a doesn't work on both vlad and meh?

It works in vlad's only as model a It doesn't work in meh as a or b.

DirtyHamster commented 1 year ago

Sorry non-intended mouse click closed the issue for a sec. lol.

DirtyHamster commented 1 year ago

That extra layer is trained along with the rest of the weights. If this is anything like inpainting, just adding the layer to a normal model works, but causes a drop in quality. I am really curios if there is a way when merging to modify the weights to the extra layer relative to model that's being merged with inapinting/pix2pix model to improve quality. A question I can't answer: Can git rebasin be repurposed for this use case where the normal sd weights are merged normally, but the entire inpainting/pix2pix model is permutated towards the normal model separately and then add at the end just the permutated layer to the merge.

Edit: Git rebasin can presume the normal sd model has the extra layer, but with weights zero.

It looks from what I'm reading that unless they fine tuned their pix2pix model for extra functionality the quality loss would be negligible and it would only ever be as good as the pix2pix base model can generate. https://huggingface.co/blog/instruction-tuning-sd From that an a few other small articles that weren't a ton of help it seems like pix2pix is it's own complete model that can be added in or run separately. Correct me if I'm reading into this wrong though.

So it probably could just be pruned out and added in before saving. I'm not sure how a merge should handle 2 models with pix2pix though and I have yet to see if that generates any other errors. Similarly I'm not sure how to identify if a pix2pix model has been fine tuned.

WhiteWipe commented 1 year ago

Looked into it just to be sure. It's the same story as an inpainting model. You take a model and train all of it with an extra layer on top. It should work just by appending the layer to a normal model, but there still should be some quality loss simply because the extra layer is slightly out of sync with the normal weights. Nonetheless, I'll try to do a PR soon for meh for pix2pix(and inpainting if it doesn't already work) and somebody can test how well the resulting merge works.

DirtyHamster commented 1 year ago

@WhiteWipe I can retest it if you can fix it those portions..

s1dlx commented 1 year ago

0.9.0 should fix this

DirtyHamster commented 1 year ago

Part 1: (Normal weighted Sum) Run as: merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\wd-ink-fp16.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test\p2pfp16test -f safetensors -ba 0.5 -bb 0.5 -pr

This outputs now too. I'll try to test the two files tomorrow if I can.

Part 2: (re-basin error) Run as: merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\wd-ink-fp16.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test\p2pfp16test -f safetensors -ba 0.5 -bb 0.5 -pr -rb -rbi 50

H:\Users\adamf\AI_Progs\sd-meh-merge\meh>merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\wd-ink-fp16.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test\p2pfp16test -f safetensors -ba 0.5 -bb 0.5 -pr -rb -rbi 50 INFO: Assembling alpha w&b INFO: base_alpha: 0.5 INFO: alpha weights: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] INFO: Loading: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors INFO: Loading: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\wd-ink-fp16.safetensors INFO: start merging with weighted_sum method INFO: Init rebasin iterations INFO: Rebasin iteration 0 stage 1: 37%|█████████████████████████▊ | 423/1131 [00:00<00:01, 687.63it/s]model.diffusion_model.input_blocks.0.0.weight torch.Size([320, 8, 3, 3]) torch.Size([320, 4, 3, 3]) model.diffusion_model.input_blocks.1.1.proj_in.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.input_blocks.1.1.proj_out.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768]) torch.Size([320, 1024]) model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768]) torch.Size([320, 1024]) model.diffusion_model.input_blocks.2.1.proj_in.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.input_blocks.2.1.proj_out.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768]) torch.Size([320, 1024]) model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768]) torch.Size([320, 1024]) stage 1: 47%|████████████████████████████████▍ | 532/1131 [00:01<00:01, 428.63it/s]model.diffusion_model.input_blocks.4.1.proj_in.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.input_blocks.4.1.proj_out.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768]) torch.Size([640, 1024]) model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768]) torch.Size([640, 1024]) model.diffusion_model.input_blocks.5.1.proj_in.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.input_blocks.5.1.proj_out.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768]) torch.Size([640, 1024]) stage 1: 53%|████████████████████████████████████▋ | 602/1131 [00:01<00:01, 420.38it/s]model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768]) torch.Size([640, 1024]) model.diffusion_model.input_blocks.7.1.proj_in.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.input_blocks.7.1.proj_out.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) stage 1: 58%|████████████████████████████████████████▎ | 660/1131 [00:01<00:01, 290.94it/s]model.diffusion_model.input_blocks.8.1.proj_in.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.input_blocks.8.1.proj_out.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) stage 1: 62%|██████████████████████████████████████████▊ | 702/1131 [00:02<00:02, 192.93it/s]model.diffusion_model.middle_block.1.proj_in.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.middle_block.1.proj_out.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) stage 1: 69%|███████████████████████████████████████████████▉ | 775/1131 [00:03<00:03, 99.98it/s]model.diffusion_model.output_blocks.10.1.proj_in.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.output_blocks.10.1.proj_out.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768]) torch.Size([320, 1024]) model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768]) torch.Size([320, 1024]) model.diffusion_model.output_blocks.11.1.proj_in.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.output_blocks.11.1.proj_out.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768]) torch.Size([320, 1024]) model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768]) torch.Size([320, 1024]) stage 1: 75%|███████████████████████████████████████████████████▌ | 845/1131 [00:03<00:01, 157.70it/s]model.diffusion_model.output_blocks.3.1.proj_in.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.output_blocks.3.1.proj_out.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) stage 1: 77%|█████████████████████████████████████████████████████▍ | 875/1131 [00:04<00:02, 103.23it/s]model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) stage 1: 79%|██████████████████████████████████████████████████████▊ | 898/1131 [00:04<00:02, 105.05it/s]model.diffusion_model.output_blocks.4.1.proj_in.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.output_blocks.4.1.proj_out.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) stage 1: 81%|████████████████████████████████████████████████████████▊ | 917/1131 [00:04<00:02, 91.55it/s]model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) stage 1: 84%|██████████████████████████████████████████████████████████▌ | 946/1131 [00:05<00:02, 81.16it/s]model.diffusion_model.output_blocks.5.1.proj_in.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.output_blocks.5.1.proj_out.weight torch.Size([1280, 1280, 1, 1]) torch.Size([1280, 1280]) model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight torch.Size([1280, 768]) torch.Size([1280, 1024]) stage 1: 87%|█████████████████████████████████████████████████████████████ | 986/1131 [00:05<00:01, 88.83it/s]model.diffusion_model.output_blocks.6.1.proj_in.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.output_blocks.6.1.proj_out.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768]) torch.Size([640, 1024]) model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768]) torch.Size([640, 1024]) stage 1: 90%|█████████████████████████████████████████████████████████████▏ | 1017/1131 [00:05<00:00, 129.52it/s]model.diffusion_model.output_blocks.7.1.proj_in.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.output_blocks.7.1.proj_out.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768]) torch.Size([640, 1024]) model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768]) torch.Size([640, 1024]) stage 1: 93%|███████████████████████████████████████████████████████████████▍ | 1055/1131 [00:05<00:00, 174.16it/s]model.diffusion_model.output_blocks.8.1.proj_in.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.output_blocks.8.1.proj_out.weight torch.Size([640, 640, 1, 1]) torch.Size([640, 640]) model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight torch.Size([640, 768]) torch.Size([640, 1024]) model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight torch.Size([640, 768]) torch.Size([640, 1024]) stage 1: 96%|█████████████████████████████████████████████████████████████████▍ | 1089/1131 [00:05<00:00, 208.53it/s]model.diffusion_model.output_blocks.9.1.proj_in.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.output_blocks.9.1.proj_out.weight torch.Size([320, 320, 1, 1]) torch.Size([320, 320]) model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight torch.Size([320, 768]) torch.Size([320, 1024]) model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight torch.Size([320, 768]) torch.Size([320, 1024]) stage 1: 100%|████████████████████████████████████████████████████████████████████| 1131/1131 [00:05<00:00, 190.52it/s] stage 2: 100%|██████████████████████████████████████████████████████████████████| 1215/1215 [00:00<00:00, 27613.69it/s]

Traceback (most recent call last):

File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 181, in main() File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1157, in call return self.main(args, kwargs) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 783, in invoke return __callback(args, **kwargs) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 162, in main merged = merge_models( File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 149, in merge_models merged = rebasin_merge( File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 319, in rebasin_merge thetas["model_a"] = apply_permutation(perm_spec, perm_1, thetas["model_a"]) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\rebasin.py", line 2197, in apply_permutation return {k: get_permuted_param(ps, perm, k, params) for k in params.keys()} File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\rebasin.py", line 2197, in return {k: get_permuted_param(ps, perm, k, params) for k in params.keys()} File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\rebasin.py", line 2183, in get_permuted_param for axis, p in enumerate(ps.axes_to_perm[k]): KeyError: 'cond_stage_model.model.ln_final.bias'

s1dlx commented 1 year ago

I see…I pushed a new fix, try pulling main now

DirtyHamster commented 1 year ago

I did try running the model that I made last night for pix2pix and got an error in vlad's:

07:39:33-463643 ERROR Error in onchange callback: sd_model_checkpoint 00-p2pfp16test.safetensors Error(s) in loading state_dict for UNetModel: size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([320, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

I'll give it a try in a little bit. I have a few maintenance things to do.

DirtyHamster commented 1 year ago

(re-basin retest) ok different looking output prior to the error this time. Looks like the same error though.

Ran as: merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\wd-ink-fp16.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test\p2pfp16test -f safetensors -ba 0.5 -bb 0.5 -pr -rb -rbi 10

Pre error output: H:\Users\adamf\AI_Progs\sd-meh-merge\meh>merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\wd-ink-fp16.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test\p2pfp16test -f safetensors -ba 0.5 -bb 0.5 -pr -rb -rbi 10 INFO: Assembling alpha w&b INFO: base_alpha: 0.5 INFO: alpha weights: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5] INFO: Loading: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors INFO: Loading: H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\wd-ink-fp16.safetensors INFO: start merging with weighted_sum method INFO: Init rebasin iterations INFO: Rebasin iteration 0 stage 1: 100%|█████████████████████████████████████████████████████████████████████| 1131/1131 [01:03<00:00, 17.95it/s] stage 2: 100%|██████████████████████████████████████████████████████████████████| 1215/1215 [00:00<00:00, 26999.81it/s]

Traceback (most recent call last): File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 181, in main() File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1157, in call return self.main(args, kwargs) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\adamf\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 783, in invoke return __callback(args, **kwargs) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\merge_models.py", line 162, in main merged = merge_models( File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 149, in merge_models merged = rebasin_merge( File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\merge.py", line 319, in rebasin_merge thetas["model_a"] = apply_permutation(perm_spec, perm_1, thetas["model_a"]) File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\rebasin.py", line 2197, in apply_permutation return {k: get_permuted_param(ps, perm, k, params) for k in params.keys()} File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\rebasin.py", line 2197, in return {k: get_permuted_param(ps, perm, k, params) for k in params.keys()} File "H:\Users\adamf\AI_Progs\sd-meh-merge\meh\sd_meh\rebasin.py", line 2183, in get_permuted_param for axis, p in enumerate(ps.axes_to_perm[k]): KeyError: 'cond_stage_model.model.ln_final.bias'

s1dlx commented 1 year ago

Mmm this is different

s1dlx commented 1 year ago

@DirtyHamster I'm not able to reproduce the error...

are you sure you are using the latest sd-meh? That would be 0.9.4

DirtyHamster commented 1 year ago

The Vlads error was from doing the merge the night before if that's what your referring to? That would have been an earlier version.

The second re-basin error was with the newer version of meh. I put it in post maintenance. I did a git clone of what you have up after blowing away the old folders so it really shouldn't have a choice. Launching it was from inside that new directory.

Edit just took care of this possibility and removed any other version that I could find lurking still got the error. - The only possible for an alt version responding that I can think of would be via pythons path but I didn't load python before calling the file name in the directory. Any thought on that?

edit 2: Unfortunately I just screwed something else up on my machine not uncommon for me... Let me fix this before I continue... I apparently didn't fix all my pathing issues when experimenting trying to run multiple python versions.... It looked like a great fix for something but was just a horrible idea.

DirtyHamster commented 1 year ago

Did clean installs for python and the few other python apps that I use the most. Cleaned up some system paths that I had missed the first time I did this earlier in the month. I still get the same error.

The most common line in the error points at: Python310\lib\site-packages\click\core.py

At the moment I'm thinking that something has to be wrong with how the click module installed for me or similar that's as far as I got so far. It might be something as simple as it missing a module in it's "requirements.txt" that you might have picked up else where. It could also be a difference in versions between us too if they've depreciated or changed any functions it might also cause issues. Which could be the reason it's working for you and not me. It's hard for me to say I'm by far not an expert with python and I'd even say I'm fairly clumsy with it still lol..

The easiest trouble shoot we could start with would be version of click: The click version I have is 8.1.6 What version do you have?

I noticed looking around in core.py at the error lines mentioned. 1434, 1157, 1078, 783 that there were some coders notes mentions changes from how version 8+ worked though they didn't point to the exact lines listed above. Being that they were both above and below some of the lines it has me a little suspect that it could be a potential cause.

s1dlx commented 1 year ago

@DirtyHamster the last test I did was with click 8.1.3

DirtyHamster commented 1 year ago

Ok that probably wouldn't be it then post version 8.1.4 they didn't even bother writing a changelog for them. Back to being stumped.

Was there any difference in the command line prompt when it was working for you? Just to rule something else out that is easy. Perhaps the models that I selected? Are you using different ones?

merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\photosomnia_v16.instruct-pix2pix.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\wd-ink-fp16.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test\p2pfp16test -f safetensors -ba 0.5 -bb 0.5 -pr -rb -rbi 10

s1dlx commented 1 year ago

I'm running a very simple command

python3 merge_models.py \
    -a ~/sd_models/img2imgPix2pix_v10Pruned.safetensors \
    -b ~/sd_models/lyriel_v15.safetensors \
    -m weighted_sum -ba 0.5 -rb -rbi 2

resulting in

INFO: Assembling alpha w&b
INFO: base_alpha: 0.5
INFO: alpha weights: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]
INFO: Loading: ~/sd_models/img2imgPix2pix_v10Pruned.safetensors
INFO: Loading: ~/sd_models/lyriel_v15.safetensors
INFO: start merging with weighted_sum method
INFO: Init rebasin iterations
INFO: Rebasin iteration 0
stage 1: 100%|██████████████████████████████████████████████████████████████████| 1143/1143 [01:01<00:00, 18.71it/s]
stage 2: 100%|██████████████████████████████████████████████████████████████████| 1131/1131 [00:00<00:00, 380444.13it/s]
INFO: Rebasin iteration 1
stage 1: 100%|██████████████████████████████████████████████████████████████████| 1143/1143 [00:23<00:00, 48.44it/s]
stage 2: 100%|██████████████████████████████████████████████████████████████████| 1131/1131 [00:00<00:00, 1860297.19it/s]
INFO: Saving model_out

it's possibly due to the actual models. Where can I download yours?

DirtyHamster commented 1 year ago

I suppose I could try again without the extra two args in the command line -pr & -bb though that doesn't seem like a major difference. It's nice to know I could drop the -bb weight too. I'll try this a bit later as well as try some additional models as soon as I spot some with p2p in them.

The two I was using are: https://civitai.com/models/18637?modelVersionId=29374 (this one has the pix2pix) https://huggingface.co/waifu-diffusion/wd-1-5-beta3/blob/main/wd-ink-fp16.safetensors (this was a fairly random pick)


In regard to the error I got on loading the normal merged one:

07:39:33-463643 ERROR Error in onchange callback: sd_model_checkpoint 00-p2pfp16test.safetensors Error(s) in loading state_dict for UNetModel: size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([320, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

I spotted a similar error on reddit when reading about forcing inpainting into models their error was this: RuntimeError: Error(s) in loading state_dict for LatentDiffusion: size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

https://www.reddit.com/r/StableDiffusion/comments/zyi24j/how_to_turn_any_model_into_an_inpainting_model/

They kind of state that the model name has to have -inpainting or at the end of the file name. (As their was inserting that for us it might be -pix2pix) as well as: Checkpoint Merger options, in the -copy config from- I've chosen [Don't] - in practice this is equivalent (I believe... not tested) to delete the *.yaml file for previously merged models.

I'm going to try that a bit later too. (the name change portion as I don't have a way of just vaporizing the yaml.)