phizaz / diffae

Official implementation of Diffusion Autoencoders
https://diff-ae.github.io/
MIT License
819 stars 123 forks source link

Training autoenc and Training latent DDM #9

Open hao-pt opened 2 years ago

hao-pt commented 2 years ago

Might I ask a stupid question that what is the difference between training autoenc and training latent DDM? As far as I understand, I suppose these two are trained at the same time. Can you enlighten me a little bit?

phizaz commented 2 years ago

They are not trained at the same time. You can train the autoencoder alone with only images (and you will only get the autoencoder). Since it is still an autoencoder, you CANNOT sample novel images from it yet. To generate new images, you need to be able to sample the "semantic code". In order to do this, you need a generative model, which is called latent DPM in this case, trained on a pool of semantic codes (to get this you need a trained autoencoder).

hao-pt commented 2 years ago

Thank you for your response! I finally understand your points after re-checking section 4 in your paper. Btw, have you experimented with a sampled latent from VAE for the generative process of DPM model? What I mean is that you stack a VAE on top of the diffusion model to get latent vector z for the decoding process.

phizaz commented 2 years ago

I could think of one way to utilize VAE which is using it as a regularization of the latent codes. Semantic codes become samples from a normal distribution. We have tried this. It was hard to strike a balance between the sample-ability (strong regularization) and expressiveness (weak regularization) of the latent code. From the quality of sample perspective, it turned out be better to learn another DDIM on top of the learned (and frozen) latent codes.

fido20160817 commented 1 year ago

what affects the sample-ability and expressiveness? I am a beginner on neural network.