ogkalu2 / Merge-Stable-Diffusion-models-without-distortion

Adaptation of the merging method described in the paper - Git Re-Basin: Merging Models modulo Permutation Symmetries (https://arxiv.org/abs/2209.04836) for Stable Diffusion
MIT License
138 stars 21 forks source link

Weird bg connections? #45

Open Xynonners opened 6 months ago

Xynonners commented 6 months ago

Hi, I'm interested in understanding what the code does.

     **easyblock("model.diffusion_model.output_blocks.6.0", "P_bg208","P_bg209"),
     **conv("model.diffusion_model.output_blocks.6.0.skip_connection","P_bg210","P_bg211"),
     **norm("model.diffusion_model.output_blocks.6.1.norm", "P_bg212"),
     **conv("model.diffusion_model.output_blocks.6.1.proj_in", "P_bg212", "P_bg213"),
     **dense("model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn1.to_q", "P_bg214", "P_bg215", bias=False),

in some sections, the bg's are disconnected in the sense that each layer has a unique set (no overlap)

     **dense("model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_out.0", "P_bg224","P_bg225", bias=True),
     **norm("model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm1", "P_bg225"),
     **norm("model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm2", "P_bg225"),
     **norm("model.diffusion_model.output_blocks.6.1.transformer_blocks.0.norm3", "P_bg225"),
     **conv("model.diffusion_model.output_blocks.6.1.proj_out", "P_bg225", "P_bg226"),

but in other sections, (like the one including this norm), the bg's are connected (overlap of P_bg255).

     **norm("cond_stage_model.transformer.text_model.encoder.layers.1.layer_norm2", "P_bg375"),

     **dense("cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.k_proj", "P_bg375", "P_bg376",bias=True),
     **dense("cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.v_proj", "P_bg375", "P_bg376",bias=True),
     **dense("cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.q_proj", "P_bg375", "P_bg376",bias=True),
     **dense("cond_stage_model.transformer.text_model.encoder.layers.2.self_attn.out_proj", "P_bg375", "P_bg376",bias=True),
     **norm("cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm1", "P_bg376"),
     **dense("cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc1", "P_bg376", "P_bg377", bias=True),
     **dense("cond_stage_model.transformer.text_model.encoder.layers.2.mlp.fc2", "P_bg377", "P_bg378", bias=True),
     **norm("cond_stage_model.transformer.text_model.encoder.layers.2.layer_norm2", "P_bg378"),

     **dense("cond_stage_model.transformer.text_model.encoder.layers.3.self_attn.k_proj", "P_bg378", "P_bg379",bias=True),

in the text encoder, everything seems connected since everything has a norm.

     **conv("model.diffusion_model.output_blocks.10.0.skip_connection","P_bg288","P_bg289"), 
     **norm("model.diffusion_model.output_blocks.10.1.norm", "P_bg290"),
     **conv("model.diffusion_model.output_blocks.10.1.proj_in", "P_bg290", "P_bg291"),

but here in the unet, the norm seems to be disconnected on one side? P_bg289 -> P_bg290?

could you explain to me the reasoning behind these choices? thanks (I'm trying to figure out how to implement rebasin correctly)

Xynonners commented 6 months ago

@AI-Casanova

ogkalu2 commented 6 months ago

@Xynonners

Well, it's been a while but I'm pretty sure I disconnected them because the weight matching algorithm wouldn't compute when they were connected.