worldmodels / worldmodels.github.io

World Models
Creative Commons Attribution 4.0 International
430 stars 55 forks source link

Mistake in diagram of ConvVAE, Appendix #8

Open janarch70 opened 6 years ago

janarch70 commented 6 years ago

relu conv 64x4 --> relu conv 64x5

screen shot 2018-06-08 at 4 47 19 pm

hardmaru commented 6 years ago

Hi @janarch70

Thanks for the feedback. Here is the code for the ConvVAE encoder / decoder:

https://github.com/hardmaru/WorldModelsExperiments/blob/master/carracing/vae/vae.py

self.x = tf.placeholder(tf.float32, shape=[None, 64, 64, 3])

# Encoder
h = tf.layers.conv2d(self.x, 32, 4, strides=2, activation=tf.nn.relu, name="enc_conv1")
h = tf.layers.conv2d(h, 64, 4, strides=2, activation=tf.nn.relu, name="enc_conv2")
h = tf.layers.conv2d(h, 128, 4, strides=2, activation=tf.nn.relu, name="enc_conv3")
h = tf.layers.conv2d(h, 256, 4, strides=2, activation=tf.nn.relu, name="enc_conv4")
h = tf.reshape(h, [-1, 2*2*256])

# VAE
self.mu = tf.layers.dense(h, self.z_size, name="enc_fc_mu")
self.logvar = tf.layers.dense(h, self.z_size, name="enc_fc_log_var")
self.sigma = tf.exp(self.logvar / 2.0)
self.epsilon = tf.random_normal([self.batch_size, self.z_size])
self.z = self.mu + self.sigma * self.epsilon

 # Decoder
h = tf.layers.dense(self.z, 4*256, name="dec_fc")
h = tf.reshape(h, [-1, 1, 1, 4*256])
h = tf.layers.conv2d_transpose(h, 128, 5, strides=2, activation=tf.nn.relu, name="dec_deconv1")
h = tf.layers.conv2d_transpose(h, 64, 5, strides=2, activation=tf.nn.relu, name="dec_deconv2")
h = tf.layers.conv2d_transpose(h, 32, 6, strides=2, activation=tf.nn.relu, name="dec_deconv3")

self.y = tf.layers.conv2d_transpose(h, 3, 6, strides=2, activation=tf.nn.sigmoid, name="dec_deconv4")

Here's the corresponding ConvVAE diagram, and corresponding description: Each convolution and deconvolution layer uses a stride of 2. The layers are indicated in the diagram in Italics as Activation-type Output Channels x Filter Size. All convolutional and deconvolutional layers use relu activations except for the output layer as we need the output to be between 0 and 1.