wohlert / generative-query-network-pytorch

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"
Other
322 stars 63 forks source link

Question about generator #9

Closed versatran01 closed 6 years ago

versatran01 commented 6 years ago

In the top docstring of generator.py, you mentioned that

The inference-generator architecture is conceptually
similar to the encoder-decoder pair seen in variational
autoencoders.

I don't quite understand this part and I would really appreciate if you could explain a bit or point me at some related aritcles. For the generator I can see how it is similar to a decoder, where it takes latent z, query viewpoint v, and aggregated representation r and eventually output the image x_mu.

But I'm a bit confused by the inference being the conterpart of encoder.

wohlert commented 6 years ago

In the standard variational autoencoder we have the following (simplified) relation:

z ~ q(z|x)
x' ~ p(x'|z)

In the GQN we have the following:

u = 0
for i in 1..L:
    z_i ~ q(z_i | x, v, r, θ)
    u_i ~ p_θ(u_i | z_i, v, r)
    u += u_i
x' ~ p(x' | u)

We can show that these two formulations are the same by letting z = prod q(z_i | z_i-1) (again a simplification) meaning that q represents some autoregressive density.

z ~ q(z|x, v, r)
x' ~ p(x'|z, v, r)
versatran01 commented 6 years ago

Thanks for the explanation, this is becoming clearer to me now.

versatran01 commented 6 years ago

One more question, so the inference_core is conceptually similar to an encoder in a VAE, yet it doesn't shrink the dimension of the image (like all other encoders do). And the generator_core again is like a decoder but doesn't increase the dimension. Is this correct?

wohlert commented 6 years ago

It is not the inference_core that is the encoder, but the combination of that and posterior_density, which does reduce the dimension. However, dimension shrinking is not a requirement to be a VAE, it just enforces a compression.

Again with the generator, it is the combination of the generator_core and u (which is upsampled) that explains the decoder.

versatran01 commented 6 years ago

Ok, so this u is basically the canvas in the 'conceptual compression' paper, right?

wohlert commented 6 years ago

Yes, exactly.

versatran01 commented 6 years ago

In the gqn paper, they have a VAE baseline, which is just following these equations you mentioned above? which is a CVAE?

z ~ q(z|x, v, r)
x' ~ p(x'|z, v, r)