Open frmrz opened 3 months ago
You need "image_dropout : 0.5" only if you want to use it for pure generation or use classifier-free guidance. Just so you know, you do not need to train.
"image_mode : False" would be better for real images, as image_mode : True is usually smother.
The network is convolutional, meaning you can still make inferences of any size if you train on crops. Like GANs the loss is not improving, but the results get better. I don't know why. You have to look at the images themself, not the loss.
The time sounds about right. 128 is about a night, which is good for Hyperparameter searching and debugging. 256 Takes two days, and 512 should take 3-7 days.
Diffusion Training is slow on a single GPU. Without conditioning, it takes even twice as much time. I use low-res für hyperparameter search. Alternatively, you could train an AE and implement latent diffusion to speed up diffusion training. Did not do this yet.
thanks for the quick answer! I want to condition the generation on the input like in a paired task. So I can set "image_dropout : 0.0" right?
Hi, thanks for sharing a great work!
I'm trying to use the repo for paired 2D image enhancement. I'm using 2 datasets, one for grayscale image enhancement and one for RGB color normalization tasks. One of the main issue is that both datasets have 512x512 resolution and I can't use the patching trick for the experiments I'm doing.
I managed to train a diffusion model for the image enhancement task, but after three days of training I get small improvements in image quality. To train the mode I used bf16 tensor format and the following settings:
The loss is improving for the first 250k iterations but then stops decreasing.![image](https://github.com/robert-graf/Readable-Conditional-Denoising-Diffusion/assets/43776981/71a38a68-7c91-4bad-b3d5-394535134bb7)
Do you have any advice on how to tune the hyperparameters to achieve better results using high resolution images?