phizaz / diffae

Official implementation of Diffusion Autoencoders
https://diff-ae.github.io/
MIT License
862 stars 130 forks source link

DiffAE without sampling + other questions #38

Closed lucasrelic99 closed 1 year ago

lucasrelic99 commented 1 year ago

Hello, thanks for the great work, it is quite interesting. I have a couple questions

Thank you for your input, if you have time constraints the first 2 questions are my largest interests, although I am curious on your thoughts to all :)

phizaz commented 1 year ago

If we are not interested in the sampling ability of DiffAE, and merely reconstruction, it is sufficient to just train DiffAE without the latent DPM (commented as 'train the autoenc model' in some of the provided training scripts), correct?

Correct.

I saw in another issue (although I can't find it anymore) you performed some experiments on regularization of z_sem. Did you observe any performance issues besides a less meaningful and interpretable semantic space?

By regularization you man z-norm? Z-normalization, making z_sem zero mean with unit variance, is crucial for both sampleability of the z_sem and also the manipulability as well.

I'm a bit confused on how the semantic encoder is trained, is a reconstruction loss simply calculated between the input and output images, and the gradient passed through the U-net and into the semantic encoder?

Correct, there is no other training signal or constraints.

I don't see any problems in doing this, but wondering if you had any comments about training DiffAE within some feature space (image encodings, for example)

Nothing in particular, however, one of my hypotheses is the fact that DiffAE comes up with semantically meaningful representation by itself might not be a universal property, might not work every time.

lucasrelic99 commented 1 year ago

Thank you!