CycleGAN for higher resolution images

junghyun-avikus commented 1 year ago

Hi, Thanks for your grea work! I have been using CycleGAN for the last a couple of days, training a model that converts a color image with 3 channels to a something simliar to UAV IR look alike image with 1 channel.

It seems doing decent jobs, but I noticed that when images loaded they get resized to 286 and cropped to 256. Since 256 is too small for my task, I am considering increasing them to at least 640 or 960. Will there be any downsides of doing so other than computational time?

Also, Pix2Pix are known to be working better than CycleGAN, but will they work even when a paired image has a slight different view? The cameras that used to capture color image and IR image have different fovs, so the color image has a wider view.

And personally do you see any advantages of using diffusion models over gans in img2img translation? Diffusion models seem to be more promising and stable than gans, but I am not sure if they are better choices for simply converting domains of an image.

taesungp commented 1 year ago

Hello,

there should be no immediate downside of running it at higher resolution.

Pix2Pix works better than CycleGAN if the underlying layout remains the same. For example, in the case of Facades and Cityscapes, one domain describes the layout of the other. In medical imaging, aligning the inputs and outputs of the training set may be nontrivial. For example, in this paper, the authors try image translation between MR and CT images. But because it's nearly impossible to get the MR and CT image pair of the same patient in the exactly same camera view, CycleGAN ends up performing better than Pix2Pix.

Diffusion models will be great if you want text-based control of natural images. Look at these images for example. They are much higher quality, thanks to 6 years of progress in research, modelling, data, and compute.

junghyun-avikus commented 1 year ago

@taesungp Thanks for your reply! Yeah I guess I should stick to CycleGAN.

Yeah I was quite surprised to see how dififusion models generate much higher quality images than GANs. So I was tempted to try diffusion models, but then most of the diffusion models have something to do with texts, such as image generation from a text or image to image translation based on a input text. So I was wondering if diffusion models are not suitable for img2img translation without any texts. Diffusion process (denoising from gaussian noise) seems irrelevant to texts, but it might be unclear to me because I haven't actually read the paper yet.

junyanz / pytorch-CycleGAN-and-pix2pix

CycleGAN for higher resolution images #1550