hako-mikan / sd-webui-supermerger

model merge extention for stable diffusion web ui
GNU Affero General Public License v3.0
753 stars 111 forks source link

Issue merging LoRA with latest version #371

Open n15g opened 7 months ago

n15g commented 7 months ago

I updated the plugin recently and have found that attempts to merge any LoRA now throw the following IndexError:

Traceback (most recent call last):
  File "D:\Lab\sd\a1111-xl\extensions\sd-webui-supermerger\scripts\mergers\pluslora.py", line 426, in lmerge
    sd = merge_lora_models(ln, lr, settings, False, calc_precision)
  File "D:\Lab\sd\a1111-xl\extensions\sd-webui-supermerger\scripts\mergers\pluslora.py", line 484, in merge_lora_models
    ratio = ratios[blockfromkey(key, keylist, isv2)]
IndexError: list index out of range

I added a debug output to find out what the problematic index is and it seems to be related to the switchover to the second text encoder. As soon as the second text encoder layers get hit, the index returns 27 and overflows the length of the ratios array.

That's as far as I've managed to get, as I'm still familiarizing myself with the code.

From pluslora.py:483:

 print(f"key: {key}, idx:{blockfromkey(key, keylist, isv2)}")
 ratio = ratios[blockfromkey(key, keylist, isv2)]
key: lora_te1_text_model_encoder_layers_0_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_0_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_10_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_11_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_1_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_2_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_3_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_4_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_5_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_6_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_7_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_8_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_mlp_fc1.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_mlp_fc1.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_mlp_fc2.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_mlp_fc2.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_self_attn_k_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_self_attn_k_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_self_attn_out_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_self_attn_out_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_self_attn_q_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_self_attn_q_proj.lora_up.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_self_attn_v_proj.lora_down.weight, idx:0
key: lora_te1_text_model_encoder_layers_9_self_attn_v_proj.lora_up.weight, idx:0
key: lora_te2_text_model_encoder_layers_0_mlp_fc1.lora_down.weight, idx:27
n15g commented 7 months ago

I walked the commits back and the issue first appeared with this commit: 6f191f36baf36b17065d49baa551f7308f1b9d4e, https://github.com/hako-mikan/sd-webui-supermerger/commit/6f191f36baf36b17065d49baa551f7308f1b9d4e

bombel28 commented 7 months ago

same here in "stable diffusion forge" Installed environment: version: f0.0.17v1.8.0rc-latest-276-g29be1da7  •  python: 3.10.6  •  torch: 2.1.2+cu121  •  xformers: 0.0.23.post1  •  gradio: 3.41.2

thePlott commented 7 months ago

I am having the same issue, I believe it has something to do with the diffusers version! but i could be wrong.

ultramint commented 7 months ago

I'm running into the same issue. Are you trying to merge SDXL LoRAs? The traceback gave me the same line: ratio = ratios[blockfromkey(key, keylist, isv2)]

I suspect it is related to the conditioning. Ain't isv2 mean "is SD 2.X model"? I don't think my merging of SDXL LoRAs should end up in this line. Plus, I found if isxl statements in the "merge to checkpoint" section in pluslora.py while there's no such in "merge LoRAs". I tried to merge SD 1.5 LoRAs and it turned out all good. Also no problem with merging SDXL LoRA into a checkpoint. I'm unfamiliar with python so could be wrong, though.

Renan7696 commented 7 months ago

+1

n15g commented 7 months ago

A couple of temporary workarounds to get XL LoRA merging capability back:

Reset to the commit before the bug was introduced:

git reset --hard 97030c89bc94ada7f56e5adcd784e90887a8db6f

Or clone this temporary fork of the repo without the above issue (No fix, just pinned to the commit before the problem was introduced): https://github.com/n15g/sd-webui-supermerger

Either way, you'll lose a few commits worth of updates, but merging capability is restored to a known-good commit.

thePlott commented 7 months ago

git reset --hard 97030c89bc94ada7f56e5adcd784e90887a8db6f

This worked for me, a temp fix but made merging XL loras work again. now I just have to remember not to update! Thank you n15g

diodiogod commented 6 months ago

Going back using git reset --hard 97030c8 is not working for merging SDXL LoRA for me anymore after I updated my Automatic1111. It used to work on 1.8... If you can please fix it @hako-mikan, we will be very grateful!

Edit: OK it was using the wrong lora name. Now it worked... sorry. But it still needs to be on that old commit.

MysticDaedra commented 5 months ago

No fix?

nu3Dec74 commented 5 months ago

No fix?

working in forge after git reset --hard 97030c89bc94ada7f56e5adcd784e90887a8db6f

hako-mikan commented 2 months ago

The XL LoRAs have merged so far have not caused any issues. Could you let me know which LoRAs you are trying to merge that result in an error? Also, please share the version of diffusers you are using.

moc67331 commented 1 month ago

same here. I used the lora generated with sd-script (b755ebd). sdscript-requirements.txt

It seems that the cause of this error is that the index returned by blockfromkey() with the key "lora_te2_text_model_encoder_layers_xxx" is 27 (because the key partially matches the last element in the LBLOCKS26 array). The ratios variable is the same length as the lora block weight, which causes an index out of range exception with this value.

The attached patch seems to resolve the error, but I'm not sure if it is entirely correct. Additionally, I think there is a slight difference between the output generated with this patched lora and that generated with the default lora. fix.patch.txt

I’m still learning in this area, so please let me know if I’ve misunderstood anything.