tbepler / spatial-VAE

Source code for "Explicitly disentangling image content from translation and rotation with spatial-VAE" - NeurIPS 2019
MIT License
62 stars 18 forks source link

Inquiry about how to obtain visualizations of samples from spatial-VAEs #1

Open AbbyYangbb opened 4 years ago

AbbyYangbb commented 4 years ago

Hello! Thank you for your time reading this!

Your work of spatial-VAE is very impressive. I really appreciate that you release your code, and I've managed to run your code (train_mnist.py) and got reasonable values for ELBO, Error, KL.

Besides, I think the animations (GIFs) of learned motions of different bio-particles in the README.md are very helpful for novices (like me) to understand the main idea of your paper. I wonder if I could also generate similar images (like each frame in your GIFs) using my own dataset? If so, would you please give me some suggestion on how to achieve that?

Thank you in advance!

tbepler commented 4 years ago

Of course, I'm glad to hear the code release is helpful. To generate images from the model, you need to select some point in latent space ($z$) and then use the generative network to get the conditional distribution over the pixel values. This part of the MNIST training script and the equivalent in the train_particles script do this. The animations here are showing the mean of this distribution (y_mu) moving through latent space.

I've been meaning to add a jupyter notebook to this repo with a code sample. Hopefully, I'll be able to get to it sometime this week.

AbbyYangbb commented 4 years ago

Thank you very much for your prompt reply!

I appreciate your plan for adding the code example to explain the image generation. Besides, when it comes to the selection of points in the latent space, I am a little bit confused about the way to do that. As noticing that z is obtained using InferenceNetwork, I wonder if InferenceNetwork is the "magic" of constraining latent variables to only represent rotation and translation?

Look forward to your kind reply!

tbepler commented 4 years ago

I was referring to the unstructured latent variables with z. These are set to have standard normal prior by the training procedure. If you want to perform inference on z for some specific image, then you can use the inference network, but this is not required to generate images with the generative model.

The rotation and translation parameters are separate, structured latent variables. The structure is imposed through the generative network, not the inference network, by transforming the coordinates which are then decoded by the spatial generator network. This is described in the paper, but see the eval_minibatch function "if rotate" and "if translate" sections to see how this is done in the code.

AbbyYangbb commented 4 years ago

I see. Thank you very much for your clarification!