Closed lucasrelic99 closed 1 year ago
If we are not interested in the sampling ability of DiffAE, and merely reconstruction, it is sufficient to just train DiffAE without the latent DPM (commented as 'train the autoenc model' in some of the provided training scripts), correct?
Correct.
I saw in another issue (although I can't find it anymore) you performed some experiments on regularization of z_sem. Did you observe any performance issues besides a less meaningful and interpretable semantic space?
By regularization you man z-norm? Z-normalization, making z_sem zero mean with unit variance, is crucial for both sampleability of the z_sem and also the manipulability as well.
I'm a bit confused on how the semantic encoder is trained, is a reconstruction loss simply calculated between the input and output images, and the gradient passed through the U-net and into the semantic encoder?
Correct, there is no other training signal or constraints.
I don't see any problems in doing this, but wondering if you had any comments about training DiffAE within some feature space (image encodings, for example)
Nothing in particular, however, one of my hypotheses is the fact that DiffAE comes up with semantically meaningful representation by itself might not be a universal property, might not work every time.
Thank you!
Hello, thanks for the great work, it is quite interesting. I have a couple questions
Thank you for your input, if you have time constraints the first 2 questions are my largest interests, although I am curious on your thoughts to all :)