Results from evaluate a trained prior differ from your sample points in dirctory.

JudgeLJX commented 1 month ago

As Title. I cannot reproduce the results from code in evaluate a trained prior.

Besides, there is another part that confused me.

1 The p(z0) is assumed as Standard Gaussian Distribution, then the normal VAE should sample from p(z0) and then feed into the decoder to generate samples. If there is no diffusion model, the way to sample is: sample z from standard Gaussian p(z0), use this z0 to get the h0, and lastly decode the h0|z0, is this right?

2 In your case, could you tell me if the diffusion model learns the variational distribution q(z0|x)? Otherwise if it learns p(z0), the diffusion process will be a generation from standard normal to standard normal.

3 When all training finished, the process is as: sample from standard normal, use diffusion to generate a better latent space representation, and this diffusion results will substitute the origin sample from p(z0) (if no diffusion), is this right? then feed this to another distribution from the second diffusion. And finally, go into the decoder to reconstruction.

4 Are these two diffusion models trained simultaneously, because h0 condition on z0?

Many thanks for your time.

ZENGXH commented 4 weeks ago

Are you able to generate reasonable shapes from the prior but number does not matched? or the generated shapes does not make sense?

correct
it learns E_p(x) q(z0|x) When training the VAE, p(z0) is N(0,1). but since we apply a small KL weight, q(z0|x) will be very far away from N(0,1). Therefore, after VAE is trained, we freezed q(z0|x), i.e., the VAE, and learn the diffusion model for the prior pθ(z0) which map N(0,1) to a complex distribution, which is closer to E_p(x) q(z0|x). See the section 3 in the paper for details https://arxiv.org/pdf/2210.06978

correct. N(0,1) is the z_T of diffusion model, and it will turn it into z_0 which is from a complex distribution
the training of two models does not have dependence, since during training, h0 is sampled from qφ(h0|x,z0) (see above eq (7)). In inference, we need to sample z0 from pθ(z0) first, then sample h0 from pψ(h0|z0)

JudgeLJX commented 3 weeks ago

I can now generate reasonable results. Thanks for your reply.

nv-tlabs / LION

Results from evaluate a trained prior differ from your sample points in dirctory. #74