Closed versatran01 closed 6 years ago
In the standard variational autoencoder we have the following (simplified) relation:
z ~ q(z|x)
x' ~ p(x'|z)
In the GQN we have the following:
u = 0
for i in 1..L:
z_i ~ q(z_i | x, v, r, θ)
u_i ~ p_θ(u_i | z_i, v, r)
u += u_i
x' ~ p(x' | u)
We can show that these two formulations are the same by letting z = prod q(z_i | z_i-1)
(again a simplification) meaning that q represents some autoregressive density.
z ~ q(z|x, v, r)
x' ~ p(x'|z, v, r)
Thanks for the explanation, this is becoming clearer to me now.
One more question, so the inference_core is conceptually similar to an encoder in a VAE, yet it doesn't shrink the dimension of the image (like all other encoders do). And the generator_core again is like a decoder but doesn't increase the dimension. Is this correct?
It is not the inference_core
that is the encoder, but the combination of that and posterior_density
, which does reduce the dimension. However, dimension shrinking is not a requirement to be a VAE, it just enforces a compression.
Again with the generator, it is the combination of the generator_core
and u
(which is upsampled) that explains the decoder.
Ok, so this u is basically the canvas in the 'conceptual compression' paper, right?
Yes, exactly.
In the gqn paper, they have a VAE baseline, which is just following these equations you mentioned above? which is a CVAE?
z ~ q(z|x, v, r)
x' ~ p(x'|z, v, r)
In the top docstring of generator.py, you mentioned that
I don't quite understand this part and I would really appreciate if you could explain a bit or point me at some related aritcles. For the generator I can see how it is similar to a decoder, where it takes latent z, query viewpoint v, and aggregated representation r and eventually output the image x_mu.
But I'm a bit confused by the inference being the conterpart of encoder.