Closed jramapuram closed 8 years ago
In most typical usage, batch_normalization is only applied during training and the moving average is tracked for inference time when batch size tends to be 1. Because of this, Phase.infer and Phase.test use variables in the graph that tracked the stddev/mean of the batches during training.
I feel like the infer/test paths may need better documentation to clear this up. Are you having the problem when using pt.Phase.train as well?
Yes, that is correct @eiderman . I use phase=pt.Phase.train
during train and phase=pt.Phase.test
during test. I haven't permuted them yet (i.e. try train for test, etc).
I've checked the implementation and it should be doing the correct thing. @jramapuram, would you mind explaining the graph to me? Also, how does this impact the evaluation metrics for the relevant loss on the test set?
I have a convolutional variational autoencoder which is mapping to a two dimensional latent space. Thus, it disentangles the manifold seen above (of MNIST). When I use do not use the phase=*
(in the scope) I see option fig 1 which is the correct expectation. When I add the phase=*
option I see fig 2. I have tried re-training many times, but still face the same issue. With regards to metrics: since this is unsupervised it is slightly hard to quantify.
My train/test objects are simply this [note in train the phase is default valued to phase=pt.Phase.train
and thus ommited ] :
with tf.variable_scope("z"): # Encode our data into z and return the mean and covariance
self.z_mean, self.z_log_sigma_sq = self.encoder(self.inputs, latent_size)
self.z = tf.add(self.z_mean,
tf.mul(tf.sqrt(tf.exp(self.z_log_sigma_sq)), eps))
# Get the reconstructed mean from the decoder
self.x_reconstr_mean = self.decoder(self.z, self.input_size)
self.z_summary = tf.histogram_summary("z", self.z)
with tf.variable_scope("z", reuse=True): # The test z
self.z_mean_test, self.z_log_sigma_sq_test = self.encoder(self.inputs, latent_size, phase=pt.Phase.test)
self.z_test = tf.add(self.z_mean_test,
tf.mul(tf.sqrt(tf.exp(self.z_log_sigma_sq_test)), eps))
# Get the reconstructed mean from the decoder
self.x_reconstr_mean_test = self.decoder(self.z_test, self.input_size, phase=pt.Phase.test)
Batch normalization is behaving correctly, but I would really like to understand this phenomenon more because it may have modeling implications on best practice for BN.
One experiment that may help to verify it is whether your test results are as good when running smaller batches than all 10k. It may be that normalizing the output based on all test examples results in a cleaner embedding. The default inference behavior of BN is geared towards generating correct and stable predictions for small batch sizes.
It would be interesting to see how the accuracy changes on the test set if you were to attach a softmax layer to the embedding (and not training lower layers by using no_gradients()
) and test it on various batch sizes.
Yet another aspect that would be interesting to test is which projection works better as a VAE. Since one of the goals is to make a decoder that can be easily sampled to generate new results, I suspect that a denser region of digits may work better since there is less likely to be junk spaces that produce non-digits within the samples space.
@eiderman : Will give it a shot for smaller batch sizes (i.e. same as training). However, this still doesn't answer why it would work when no phase parameter is provided. Does batch normalization turn off without a provided phase parameter?
I'm not sure the softmax layer makes any sense. This is a pure unsupervised problem. There are no class labels that can be provided to update the softmax's weights & biases. I'm assuming you would be talking about a softmax+cross-entropy as an optimization objective.
Jason, with the phase set, batch normalization looks like:
If you do not set the phase, it defaults to 'train' in both cases. This means that the version without Phase set is performing normalization during inference by using the test set activations, which is not really a good thing because it can easily bring the network outside of the ranges during training and a test example's prediction may be sensitive to other items in the batch. In your case, it appears to have made your model do a better separation, but there are some caveats:
To test 1& 2, I would recommend either computing the test reconstruction loss (preferable) or attaching a classification loss and only training the classification layer. While I suggested softmax before, I think nearest neighbor vs train set may work just as well for a smoke test.
To test 3, Just sample from the model and make sure to hit the white space on your graph to see how the digits look. Doing enough of these to achieve statistical significance would be hard, but sampling from a VAE using a gaussian should probably give you equal probability.
On Sat, Jun 4, 2016 at 5:23 AM, Jason Ramapuram notifications@github.com wrote:
@eiderman https://github.com/eiderman : Will give it a shot for smaller batch sizes (i.e. same as training). However, this still doesn't answer why it would work when no phase parameter is provided. Does batch normalization turn off without a provided phase parameter?
I'm not sure the softmax layer makes any sense. This is a pure unsupervised problem. There are no class labels that can be provided to update the softmax's weights & biases. I'm assuming you would be talking about a softmax+cross-entropy as an optimization objective.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/prettytensor/issues/23#issuecomment-223752839, or mute the thread https://github.com/notifications/unsubscribe/ABnmwJgDDiKcIt68_7pkTc4n4wIdotN9ks5qIW5BgaJpZM4IoVym .
@eiderman : I updated my logic to do inference using only batch_size
as such:
def plot_2d_cvae(sess, source, cvae):
z_mu = []
y_sample = []
for _ in range(np.floor(10000.0 / FLAGS.batch_size).astype(int)):
x_sample, y = source.test.next_batch(FLAGS.batch_size)
z_mu.append(cvae.transform(sess, x_sample))
y_sample.append(y)
z_mu = np.vstack(z_mu)
y_sample = np.vstack(y_sample)
print 'z.shape = ', z_mu.shape, ' | y_sample.shape = ', y_sample.shape
plt.figure(figsize=(8, 6))
plt.scatter(z_mu[:, 0], z_mu[:, 1], c=np.argmax(y_sample, 1))
plt.colorbar()
plt.savefig("models/2d_cluster.png", bbox_inches='tight')
#plt.show()
When the phase is set to test it looks like the same issue is present:
However, setting phase=train for both test & train, it accurately separates the manifold:
To address your points:
Listed below is reconstruction when Phase.test is set accurately:
And here is when using Phase.train :
When using batch normalization with the running mean it appears to be projecting to ~ the same location (as per the reconstruction). Thus I believe that there is either something wrong with the batch_normalization implementation on the conv2d op.
My apologies for being obtuse. BN is working as intended, but there is a gotcha (which I am currently fixing). In order for you to update the averaged mean and variance variables, you need to run the update ops on each iteration.
These are executed by adding a dependency on pt.with_update_ops as documented here: https://github.com/google/prettytensor/blob/master/docs/pretty_tensor_top_level.md#apply_optimizerlosses-regularizetrue-include_markedtrue
This is really a poor API to trickle out to other users, so I will fix it so that the updates are part of the graph.
On Mon, Jun 6, 2016 at 7:16 AM, Jason Ramapuram notifications@github.com wrote:
@eiderman https://github.com/eiderman : I updated my logic to do inference using only batch_size as such:
def plot_2d_cvae(sess, source, cvae): z_mu = [] ysample = [] for in range(np.floor(10000.0 / FLAGS.batch_size).astype(int)): x_sample, y = source.test.next_batch(FLAGS.batch_size) z_mu.append(cvae.transform(sess, x_sample)) y_sample.append(y)
z_mu = np.vstack(z_mu) y_sample = np.vstack(y_sample) print 'z.shape = ', z_mu.shape, ' | y_sample.shape = ', y_sample.shape plt.figure(figsize=(8, 6)) plt.scatter(z_mu[:, 0], z_mu[:, 1], c=np.argmax(y_sample, 1)) plt.colorbar() plt.savefig("models/2d_cluster.png", bbox_inches='tight') #plt.show()
When the phase is set to test it looks like the same issue is present: [image: 2d_cluster] https://cloud.githubusercontent.com/assets/8204807/15824556/0b19c8f4-2c00-11e6-99f6-213158c23c6a.png
However, setting phase=train for both test & train, [image: 2d_cluster] https://cloud.githubusercontent.com/assets/8204807/15824544/f5dd4cb8-2bff-11e6-9711-c4fdd63ebbff.png it accurately separates the manifold:
To address your points:
The 2d representation is perfectly sufficient for MNIST as the manifold has been proven to be separable in this manner via the SOM, autoencoder and t-sne literature, so I don't believe that is the issue at hand. 2.
There is no intermingling going on. Phase.train is used on the parameters that are optimized during training time using the training data for MNIST. Phase.test is used at test time using the reused parameters (i.e. weights/biases) but working on the test data for MNIST. The training loss after around 400 epochs is 138.141. This is the standard VAE loss (2 part loss). I haven't had the time to add an extra layer and such. 3.
I am not using it as a generative model for the above use case. Merely as one to separate the visualize a disentangled feature space. However, here is a visualization of the reconstruction as requested from both cases (one with Phase.train for train & Phase.test for test parameters [the correct method] and one for Phase.train set for both test & train functions [the incorrect method that proves that batch_normalization is NOT working accurately).
Listed below is reconstruction when Phase.test is set accurately: [image: 20d_reconstr_4] https://cloud.githubusercontent.com/assets/8204807/15824647/7a5685c2-2c00-11e6-9398-fba7bc504a26.png
And here is when using Phase.train :
[image: 20d_reconstr_4] https://cloud.githubusercontent.com/assets/8204807/15824672/9c3be650-2c00-11e6-992b-784e5d076063.png
When using batch normalization with the running mean it appears to be projecting to ~ the same location (as per the reconstruction). Thus I believe that there is either something wrong with the batch_normalization implementation on the conv2d op.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/prettytensor/issues/23#issuecomment-223971856, or mute the thread https://github.com/notifications/unsubscribe/ABnmwGSfkOHs1pqEBwUxW2guqGyqUF-Lks5qJCuvgaJpZM4IoVym .
Great! Thanks!
I added fix to automatically compute the running variance/mean for inference time. If you have any other issues, please let me know!
I'm a little surprised at how poorly the model did with the initial variance (1.0) and mean (0.0). I would have expected the training to have made it somewhat resilient to scale and shift of features.
Great! Will give it a shot and get back
Thanks for the assistance @eiderman ! It is working as intended now.
I believe that there is an error when using
phase
in thedefault_scope
coupled withbatch_normalize=True
.Basically it looks like this:
Full code here: https://github.com/jramapuram/CVAE/blob/master/cvae.py If I remove
phase=phase
within the scope assignment my model produces the following:However, when setting the phase appropriately I get the following:
This is trained for the same number of iterations using the same model.