openai / glow

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
https://arxiv.org/abs/1807.03039
MIT License
3.12k stars 516 forks source link

The purpose of learntop argument #66

Open XuezheMax opened 6 years ago

XuezheMax commented 6 years ago

Hi,

I want to ask the purpose of the learntop arguments. I found that this argument is only used in the following code:

def prior(name, y_onehot, hps):

    with tf.variable_scope(name):
        n_z = hps.top_shape[-1]

        h = tf.zeros([tf.shape(y_onehot)[0]]+hps.top_shape[:2]+[2*n_z])
        if hps.learntop:
            h = Z.conv2d_zeros('p', h, 2*n_z)
        if hps.ycond:
            h += tf.reshape(Z.linear_zeros("y_emb", y_onehot,
                                           2*n_z), [-1, 1, 1, 2 * n_z])

        pz = Z.gaussian_diag(h[:, :, :, :n_z], h[:, :, :, n_z:])

What is the purpose to input a zero vector h into a convolution network? Thanks.

naturomics commented 5 years ago

Hope my explanation will help you:

  1. as the latent space is constrained to this Gaussian distribution p(z; mean, scale), in order to calculate logp(z), we need z, mean and scale all to be known. We get z=encoder(x), but we don't know its mean and scale, the solution here is assuming mean and scale are learnable(mean, scale=h in the code), so if learntop is true, mean and scale will be trained as part of parameters of model. you can see they always set learntop=true, or mean,scale=0 as h is initialized as 0(if ycond=false).
  2. why input zero h into conv2d: the last layer of encoder z=f(x) is a 1x1 conv layer (z with shape NHWC), and based on the fact that convolution shares the weights between patch, so p(z) also share the mean and scale parameter between spatial dimension, e.g. mean and scale both should have shape [1,1,1, C]. you can implement it as follow:
    h = tf.get_variable('h', [1,1,1, 2*n_z])
    mean = h[:, :, :, :n_z]
    logscale = h[:, :, :, n_z:]

    to replace these two ops h=tf.zero and h=conv2d_zeros. But you will need to tile h in the NHW dimensions to have the same shape as z for subsequent calculation. In the repo they just play a trick so we don't need to call tile op.

Of course it should work leaving h=[1,1,1, 2*n_z] and don't call tile op, because TF will do the broadcasting job for you.

XuezheMax commented 5 years ago

Thanks a lot for your reply. It is pretty clear!

On Wed, Dec 5, 2018 at 8:20 AM Huadong Liao notifications@github.com wrote:

Hope my explanation will help you:

  1. as the latent space is constrained to this Gaussian distribution p(z; mean, scale), in order to calculate logp(z), we need z, mean and scale all to be known. We get z=encoder(x), but we don't know its mean and scale, the solution here is assuming mean and scale are learnable(mean, scale=h in the code), so if learntop is true, mean and scale will be trained as part of parameters of model. you can see they always set learntop=true, or mean,scale=0 as h is initialized as 0(if ycond=false).
  2. why input zero h into conv2d: the last layer of encoder z=f(x) is a 1x1 conv layer (z with shape NHWC), and based on the fact that convolution shares the weights between patch, so p(z) also share the mean and scale parameter between spatial dimension, e.g. mean and scale both should have shape [1,1,1, C]. you can implement it as follow:

h = tf.get_variable('h', [1,1,1, 2*n_z]) mean = h[:, :, :, :n_z] logscale = h[:, :, :, n_z:]

to replace these two ops h=tf.zero and h=conv2d_zeros. But you will need to tile h in the NHW dimensions to have the same shape as z for subsequent calculation. In the repo they just play a trick so we don't need to call tile op.

Of course it should work leaving h=[1,1,1, 2*n_z] and don't call tile op, because TF will do the broadcasting job for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openai/glow/issues/66#issuecomment-444482396, or mute the thread https://github.com/notifications/unsubscribe-auth/ADUtltVUzDQ8M1GNZ4mzBf5sLZOCU_TSks5u18gEgaJpZM4YyCpO .

--

Best regards, Ma,Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977