harubaru / waifu-diffusion

stable diffusion finetuned on weeb stuff
GNU Affero General Public License v3.0
1.94k stars 177 forks source link

About configuration of first stage model #27

Closed eeyrw closed 1 year ago

eeyrw commented 1 year ago

I find some difference between finetune and inference config about item resolution. This is for finetune and resolution is set to 512:

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          double_z: true
          z_channels: 4
          resolution: 512
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

This is for inference and resolution is set to 256:

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

As far as I know, the first stage model AutoencoderKL is frozen when finetuning. So what's the purpose of changing the resolution? And inference and finetune should share same parameters I think.

harubaru commented 1 year ago

I was using Stable Diffusion defaults for the first stage config