Open dearkafka opened 1 year ago
Thank you for your attention. If you would like to train in latent space, maybe you need to train a 512 VQGAN, or you can just train in pixel space, then you can just modify configs/Template-BBDM and replace image_size .
thank you for details, I will procees with pixel space, I guess. Have you tried any experiments with higher resolution yourselves?
Actually, we haven't tried with highter resolution due to the computational cost. So, I am curious about the experiment results.
quick update, probably not gonna go with higher resolution for now. Tried 256 for img2img, too much identity change to my taste (e.g. as compared to pix2pix)
Can you advise how to change params in order to train on 512?
you can donwload such VQGAN ckpt from here. https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip This expects (batchsize, 512,512,3) images and encodes into (batchsize, 3,128,128) size.
Actually, we haven't tried with highter resolution due to the computational cost. So, I am curious about the experiment results.
I'm doing some experiments with 512 x 512 images. I'll let you know the results. : )
Hi, I'm experimenting with the BBDM model in pixel space using 512x512x1 images for an image enhancement task.
I managed to find a configuration that would let me train the model using 22GB of VRAM. After 1 day of training the results are in the right direction in terms of contrast improvement, but images are very blurry. My understanding is that this is due to the reduced number of filters in the network and the attention heads.
I have a couple of questions also related to issues #46 #27 #8:
My config:
Brownian Bridge Diffusion Model Template(Pixel Space)
runner: "BBDMRunner"
training:
n_epochs: 200
n_steps: 400000
save_interval: 20
sample_interval: 2
validation_interval: 20
accumulate_grad_batches: 1
testing:
clip_denoised: True
sample_num: 1
data:
dataset_name: ''
dataset_type: 'custom_aligned'
dataset_config:
dataset_path: ''
image_size: 512
channels: 1
to_normal: True
flip: False
train:
batch_size: 1
shuffle: True
val:
batch_size: 1
shuffle: True
test:
batch_size: 1
# shuffle: False
model:
model_name: "BrownianBridge" # part of result path
model_type: "BBDM" # specify a module
latent_before_quant_conv: False
normalize_latent: False
only_load_latent_mean_std: False
# model_load_path: # model checkpoint path
# optim_sche_load_path: # optimizer scheduler checkpoint path
EMA:
use_ema: False
ema_decay: 0.995
update_ema_interval: 8 # step
start_ema_step: 30000
CondStageParams:
n_stages: 2
in_channels: 1
out_channels: 1
BB:
optimizer:
weight_decay: 0.000
optimizer: 'Adam'
lr: 1.e-4
beta1: 0.9
lr_scheduler:
factor: 0.5
patience: 3000
threshold: 0.0001
cooldown: 3000
min_lr: 5.e-7
params:
mt_type: 'linear' # options {'linear', 'sin'}
objective: 'grad' # options {'grad', 'noise', 'ysubx'}
loss_type: 'l1' # options {'l1', 'l2'}
skip_sample: True
sample_type: 'linear' # options {"linear", "sin"}
sample_step: 200
num_timesteps: 1000 # timesteps
eta: 1.0 # DDIM reverse process eta
max_var: 1.0 # maximum variance
UNetParams:
image_size: 512
in_channels: 1
model_channels: 32
out_channels: 1
num_res_blocks: 1
attention_resolutions: !!python/tuple
- 32
- 16
- 8
channel_mult: !!python/tuple
- 1
- 4
- 8
conv_resample: True
dims: 2
num_heads: 8
num_head_channels: 64
use_scale_shift_norm: True
resblock_updown: True
use_spatial_transformer: False
context_dim:
condition_key: "nocond" # options {"SpatialRescaler", "first_stage", "nocond"}
@frmrz Hi, I'm experimenting with the BBDM model in pixel space using 2562563 images,how can I change params?I hope you can give me some help,thank u
Can you advise how to change params in order to train on 512?