phizaz / diffae

Official implementation of Diffusion Autoencoders
https://diff-ae.github.io/
MIT License
836 stars 128 forks source link

Issues with Conditional Sampling #75

Open aashishrai3799 opened 6 months ago

aashishrai3799 commented 6 months ago

Hi,

Consider the following lines of code:

cond1 = model.encode(batch) xT = model.encode_stochastic(batch, cond1, T=50) pred = model.render(noise= xT , cond=cond1, T=20)

xT_rand = torch.rand(xT.shape, device=device)

pred_rand = model.render(noise= xT_rand , cond=cond1, T=20)

The above autoencoding works perfectly as expected. However, instead of using xT, if I use xT_rand with the same cond1, I get nothing but noise in the predicted image. Could you please help me understand why that happens? As mentioned in the paper, most of the semantic information is captured in z_sem, so why does it fails in this case?

Your response will be greatly appreciated.

Thank you!

phizaz commented 6 months ago

torch.rand is a uniform random which is not what the diffusion model trained for. Please use torch.randn.

aashishrai3799 commented 6 months ago

Hi, thank you for your quick response. Despite using torch.randn, I get distorted output. Here's an example:

(input - noise - prediction) image

And this happens for all the examples I tested, not just this one. Do you have any insights into why this is happening?

Thanks again!

phizaz commented 6 months ago

I'm not sure what's the usecase here. Can you tell me what's the big picture? This doesn't seem like the usecase mentioned in the paper.