xuekt98 / BBDM

BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models
MIT License
262 stars 27 forks source link

Model Parameters #45

Open Wazhee opened 8 months ago

Wazhee commented 8 months ago

Great Work!

I was curious what the various options for the objective parameter mean? Im looking in BBDM.yaml file and see " objective: 'grad' # options {'grad', 'noise', 'ysubx'}" with no clue where they are referring to. Can you provide insight or references?

nandishaivalli commented 4 months ago

Different objective types ('grad', 'noise', 'ysubx') provide different guidance: 'grad': This objective incorporates information from both the encoded source and target images, along with noise and interpolation coefficients, to steer the sampling process in a way that considers both the starting point and the desired target. 'noise': This objective simply uses the noise itself, allowing for a more random exploration of the latent space. 'ysubx': This objective directly uses the difference between the encoded target and source images, guiding the model towards reducing that difference and reaching the source representation.

AlfredHall77 commented 3 months ago

Different objective types ('grad', 'noise', 'ysubx') provide different guidance: 'grad': This objective incorporates information from both the encoded source and target images, along with noise and interpolation coefficients, to steer the sampling process in a way that considers both the starting point and the desired target. 'noise': This objective simply uses the noise itself, allowing for a more random exploration of the latent space. 'ysubx': This objective directly uses the difference between the encoded target and source images, guiding the model towards reducing that difference and reaching the source representation.

Thank you for the clear explanation! The default objective type is set to "grad" both in the code and the paper. Have you ever tried training and testing with "noise" or "ysubx"? I wonder what would be the different effects? If I want to achieve well-aligned image-to-image translation, would using "noise" or "ysubx" bring better results (for example, achieving higher PSNR and SSIM)?

nandishaivalli commented 3 months ago

I dont have that much compute to do the training now. the trained model is not in chrome/ secure site to download and finetune. but any way..

unlike other models adding noise to the previous image in BBDM is different (qBB(xt|x0,y)=N(xt;(1−mt)x0 +mty,δtI) (model/BrownianBridge/BrownianBridgeModel.py#L128). it takes into account the initial state and final state to calculate noise and remove it. (more or less)

objective = m_t * (y - x0) + sigma_t * noise this is the objective to remove noise based on both initial and target domain.

if you use objecive = noise the BBDM acts as a normal diffusion on latent vectors obtained from vqgan.

nandishaivalli commented 3 months ago

or as i just did the math get an intuition

image

go through the p_sample and q_sample funcion for better understanding

AlfredHall77 commented 3 months ago

or as i just did the math get an intuition image

go through the p_sample and q_sample funcion for better understanding

I gain much inspiration, thank you so much!