training on higher res - Githubissues

dearkafka commented 1 year ago

Can you advise how to change params in order to train on 512?

xuekt98 commented 1 year ago

Thank you for your attention. If you would like to train in latent space, maybe you need to train a 512 VQGAN, or you can just train in pixel space, then you can just modify configs/Template-BBDM and replace image_size .

dearkafka commented 1 year ago

thank you for details, I will procees with pixel space, I guess. Have you tried any experiments with higher resolution yourselves?

xuekt98 commented 1 year ago

Actually, we haven't tried with highter resolution due to the computational cost. So, I am curious about the experiment results.

dearkafka commented 1 year ago

quick update, probably not gonna go with higher resolution for now. Tried 256 for img2img, too much identity change to my taste (e.g. as compared to pix2pix)

egshkim commented 1 year ago

Can you advise how to change params in order to train on 512?

you can donwload such VQGAN ckpt from here. https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip This expects (batchsize, 512,512,3) images and encodes into (batchsize, 3,128,128) size.

Actually, we haven't tried with highter resolution due to the computational cost. So, I am curious about the experiment results.

I'm doing some experiments with 512 x 512 images. I'll let you know the results. : )

frmrz commented 7 months ago

Hi, I'm experimenting with the BBDM model in pixel space using 512x512x1 images for an image enhancement task.

I managed to find a configuration that would let me train the model using 22GB of VRAM. After 1 day of training the results are in the right direction in terms of contrast improvement, but images are very blurry. My understanding is that this is due to the reduced number of filters in the network and the attention heads.

I have a couple of questions also related to issues #46 #27 #8:

The saved models and optimizer are very small (44 and 86 MB), did you know why the memory requirement gets so high?
Did you manage to find a better configuration to work in pixel space or do you know any diffusion-based alternatives for image-to-image translation?

My config:

Brownian Bridge Diffusion Model Template(Pixel Space)
runner: "BBDMRunner"
training:
  n_epochs: 200
  n_steps: 400000
  save_interval: 20
  sample_interval: 2
  validation_interval: 20
  accumulate_grad_batches: 1

testing:
  clip_denoised: True
  sample_num: 1

data:
  dataset_name: ''
  dataset_type: 'custom_aligned'
  dataset_config:
    dataset_path: ''
    image_size: 512
    channels: 1
    to_normal: True
    flip: False
  train:
    batch_size: 1
    shuffle: True
  val:
    batch_size: 1
    shuffle: True
  test:
    batch_size: 1
    # shuffle: False

model:
  model_name: "BrownianBridge" # part of result path
  model_type: "BBDM" # specify a module
  latent_before_quant_conv: False
  normalize_latent: False
  only_load_latent_mean_std: False
  # model_load_path:  # model checkpoint path
  # optim_sche_load_path:  # optimizer scheduler checkpoint path

  EMA:
    use_ema: False
    ema_decay: 0.995
    update_ema_interval: 8 # step
    start_ema_step: 30000

  CondStageParams:
    n_stages: 2
    in_channels: 1
    out_channels: 1

  BB:
    optimizer:
      weight_decay: 0.000
      optimizer: 'Adam'
      lr: 1.e-4
      beta1: 0.9

    lr_scheduler:
      factor: 0.5
      patience: 3000
      threshold: 0.0001
      cooldown: 3000
      min_lr: 5.e-7

    params:
      mt_type: 'linear' # options {'linear', 'sin'}
      objective: 'grad' # options {'grad', 'noise', 'ysubx'}
      loss_type: 'l1' # options {'l1', 'l2'}

      skip_sample: True
      sample_type: 'linear' # options {"linear", "sin"}
      sample_step: 200

      num_timesteps: 1000 # timesteps
      eta: 1.0 # DDIM reverse process eta
      max_var: 1.0 # maximum variance

      UNetParams:
        image_size: 512
        in_channels: 1
        model_channels: 32
        out_channels: 1
        num_res_blocks: 1
        attention_resolutions: !!python/tuple
          - 32
          - 16
          - 8
        channel_mult: !!python/tuple
          - 1
          - 4
          - 8
        conv_resample: True
        dims: 2
        num_heads: 8
        num_head_channels: 64
        use_scale_shift_norm: True
        resblock_updown: True
        use_spatial_transformer: False
        context_dim:
        condition_key: "nocond" # options {"SpatialRescaler", "first_stage", "nocond"}

PrinceYao1001 commented 6 months ago

@frmrz Hi, I'm experimenting with the BBDM model in pixel space using 2562563 images，how can I change params？I hope you can give me some help，thank u

xuekt98 / BBDM

training on higher res #8