robert-graf / Readable-Conditional-Denoising-Diffusion

Readable Conditional Denoising Diffusion
MIT License
25 stars 1 forks source link

2D image enhancement #3

Open frmrz opened 3 months ago

frmrz commented 3 months ago

Hi, thanks for sharing a great work!

I'm trying to use the repo for paired 2D image enhancement. I'm using 2 datasets, one for grayscale image enhancement and one for RGB color normalization tasks. One of the main issue is that both datasets have 512x512 resolution and I can't use the patching trick for the experiments I'm doing.

I managed to train a diffusion model for the image enhancement task, but after three days of training I get small improvements in image quality. To train the mode I used bf16 tensor format and the following settings:

# Settings
lr : 0.0002
max_epochs : 2000
num_cpu : 16
exp_name : exp1
dataset : exp1_dsT # Just override this by using -ds {name}
dataset_val : exp1_dsV # Just override this by using -ds {name}
new : True  
size : 512 
L2_loss : False  
channels : 32
batch_size : 4  
timesteps : 1000 # should be 1000
conditional : True  # used only in Image2Image 
image_dropout : 0.5 # Does not work with Label2Image
flip : False  # used only in Image2Image  reverses the predicton directon
image_mode : True
lambda_ssim : 0.0
# patch_size : 

# Always recomeded
learned_variance : False  
linear : False  
model_name : unet

The loss is improving for the first 250k iterations but then stops decreasing. image

Do you have any advice on how to tune the hyperparameters to achieve better results using high resolution images?

robert-graf commented 3 months ago

You need "image_dropout : 0.5" only if you want to use it for pure generation or use classifier-free guidance. Just so you know, you do not need to train.

"image_mode : False" would be better for real images, as image_mode : True is usually smother.

The network is convolutional, meaning you can still make inferences of any size if you train on crops. Like GANs the loss is not improving, but the results get better. I don't know why. You have to look at the images themself, not the loss.

The time sounds about right. 128 is about a night, which is good for Hyperparameter searching and debugging. 256 Takes two days, and 512 should take 3-7 days.

Diffusion Training is slow on a single GPU. Without conditioning, it takes even twice as much time. I use low-res für hyperparameter search. Alternatively, you could train an AE and implement latent diffusion to speed up diffusion training. Did not do this yet.

frmrz commented 3 months ago

thanks for the quick answer! I want to condition the generation on the input like in a paired task. So I can set "image_dropout : 0.0" right?