saiboxx / chexray-diffusion

Code for "Cascaded Latent Diffusion Models for High-Resolution Chest X-ray Synthesis" @ PAKDD 2023
MIT License
43 stars 7 forks source link

Request for details on Creating Low Resolution Samples #8

Open aryangoyal7 opened 3 months ago

aryangoyal7 commented 3 months ago

In the paper, it is mentioned that to train the SR module, high-resolution images were downsampled using a bicubic filter. I am interested in understanding more about the methods used to create the low-resolution samples. Specifically, I have the following questions:

Besides using the bicubic filter, what other methods were employed to create low-resolution samples? Were any artificial noise or blur introduced to the low-resolution samples? If so, could you please provide details? Could you share the exact script or code used to create these low-resolution samples from high-resolution images?

saiboxx commented 3 months ago

Hi,

Thanks for your interest in our work.

For training the SR module we follow the described procedure in the original S3 paper (Saharia et al.). To produce low resolution samples we simply use the standard Resize function in pytorch (See here). I also recommend to have a look at our training script.

We found that in our cascaded pipeline the upscaling of samples produced by the VAE decoder does not provide the level of desired quality with the above described method and script. So we conducted an additional finetuning step of the SR module. We used the LDM VAE to encode + decode the training dataset to get low resolution images that match the output distribution of the decoder, i.e: "real" image 1024px --> Resize 256 px --> encode --> decode --> Decoded "real" 256px image as conditioning for the SR module.

Does this help you?

Cheers, Tobias