Intuition behind pretraining to find centers

kingofspace0wzz / wae-rnf-lm

Pytorch Implemetation for our NAACL2019 Paper "Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling" https://arxiv.org/abs/1904.02399

MIT License

62 stars 4 forks source link

Intuition behind pretraining to find centers #4

Closed vikigenius closed 5 years ago

vikigenius commented 5 years ago

So in the paper it is mentioned that a standard KL annealed VAE is trained to get an initial approximation of the manifold.

Is the initial VAE supposed to have the same Encoder Decoder architectures as the RNF WAE ? In any case since the initializations, parameters etc. could be different how does it guarantee that the initial centers learned are the actual centers which occur during the training of the RNF WAE ?

kingofspace0wzz commented 5 years ago

Yes, it is based on LSTM VAE, but you can also try other VAE variants.

I haven't studied the effects of different VAE upon initialization of the latent manifold. So, to your second question, I don't have a completely clear and solid answer at this moment. However, I can provide an intuitive explanation. First, note that the transformed latent manifold does not need to be the original manifold. All we need is that the two geometric structures are as close to be homeomorphic as possible. The second observation is, homeomorphism is not unique. Therefore, it shouldn't be sensitive to initialization. Every time you have a different initial manifold to work with, you can always find another manifold that is as homeomorphic as possible to the original one.

vikigenius commented 5 years ago

Thanks, for the answer. Yes I can intuitively understand your two points and I agree.I guess at least it's likely for initialization to not play a role. But do you know if there is a theoretical guarantee for homeomorphism or at least that most topological properties will be preserved ?

I am still not convinced about architecture, so does it mean that whatever architecture the original VAE is based on, should the RNF WAE also have the same architecture ? For eg: can I use cnn-vae for the original VAE, but then use a LSTM encoder decoder for the RNF WAE ? It would be interesting if that were the case because you could just use a simple VAE for pre-training and clustering and then go on to use more complex architectures for RNF WAE.

kingofspace0wzz commented 5 years ago

@vikigenius Sorry for the late response. I think that is an interesting question. My intuition is that architecture choice does not play a role here, as long as we are learning the same distribution. It is just different ways to parameterize the same distribution family in my opinion. I think I will do some experiments to see whether this is true, maybe after my ICLR 2020 deadline.