vislearn / ControlNet-XS

Apache License 2.0
411 stars 12 forks source link

why is your controlnet-xs smaller than original controlnet? #9

Closed zhangvia closed 9 months ago

zhangvia commented 9 months ago

i see the blog. but it seems like that model A,B are same as the original controlnet. both them have a complete encoder of unet. and model C has a whole unet. why do they have smaller weights than original controlnet?

LoveU3tHousand2 commented 9 months ago

The dim and the num block of resnets, you can reduce them and keep them same by zero-conv at connection I guess.

zhangvia commented 9 months ago

The dim and the num block of resnets, you can reduce them and keep them same by zero-conv at connection I guess.

i just compare the config file of controlnet-xs and original controlnet. they look almost the same: the config file are from sd21_encD_canny_14m.yaml and cldm_v15.yaml

control_stage_config:
      target: ldm.modules.diffusionmodules.twoStreamControl.TwoStreamControlNet
      params:
        use_checkpoint: true
        image_size: 32
        in_channels: 4
        out_channels: 4
        hint_channels: 3
        model_channels: 320
        attention_resolutions:
        - 4
        - 2
        - 1
        num_res_blocks: 2
        channel_mult:
        - 1
        - 2
        - 4
        - 4
        num_head_channels: 8
        use_spatial_transformer: true
        use_linear_in_transformer: true
        transformer_depth: 1
        context_dim: 1024
        legacy: false
        infusion2control: cat
        infusion2base: add
        guiding: encoder_double
        two_stream_mode: cross
        control_model_ratio: 0.0125
control_stage_config:
      target: cldm.cldm.ControlNet
      params:
        image_size: 32 # unused
        in_channels: 4
        hint_channels: 3
        model_channels: 320
        attention_resolutions: [ 4, 2, 1 ]
        num_res_blocks: 2
        channel_mult: [ 1, 2, 4, 4 ]
        num_heads: 8
        use_spatial_transformer: True
        transformer_depth: 1
        context_dim: 768
        use_checkpoint: True
        legacy: False

maybe the config file sd21_encD_canny_14m.yaml just add the connections between controlnet encoder and unet encoder?

zhangvia commented 9 months ago

The out_channels of control model are reduced from 320 to 4

i noticed that. but controlnet-xs code doesn't use the out_channels in init function. all res blocks out_channels are computed using model_channels which is same as the original controlnet code

LoveU3tHousand2 commented 9 months ago

The dim and the num block of resnets, you can reduce them and keep them same by zero-conv at connection I guess.

infusion_factor = int(1 / control_model_ratio)