initialization value 0 should not be multiplied by tf.Variable()

mikowals commented 8 years ago

Thanks very much for sharing your work. I was struggling to implement a version of this myself.

This code, however, doesn't match the outcomes in the paper and I think it is because of initializations like:

wi = lambda inits, name: inits * tf.Variable(tf.ones([size]), name=name)
a1 = wi(0., 'a1')

I think the in built tensorflow optimizations can only update the trainable tf.Variables so by multiplying the tf.Variable by 0 (effectively removing them from the optimization) a very different model is being fit.

You can see this by trying to duplicate the paper results by using the g_gauss function in the decoder and seeing that the d_cost outputs never change.

I think a fix is to move inits into the initialization value when defining a new variable. This way the value of inits will only impact the beginning value. Like this:

 wi = lambda inits, name: tf.Variable(inits * tf.ones([size]), name=name)

Though training is still running the d_cost values are updating and accuracy is much higher - 98.65% after 10 epochs.

rinuboney commented 8 years ago

Thanks! I've been trying to figure out the problem. I also found another minor difference in the batch normalization of the decoder. I've pushed the code. But, 98.65% after 10 epochs? I get something more close to 91%. Did you make any other changes?

mikowals commented 8 years ago

I was running with num_labeled at either 50k or 60k examples to try and hit the 0.58% error rate reported in the paper. With the large number of labels convergence is much faster.

The end result after changing the initialization was ~0.70% error rate and I also noticed something funny in the update_batch_norm. After fixing that I was able to hit 0.57% error rate so I think the rest of the code works. I will file separate issues for that a few other small fixes.

rinuboney commented 8 years ago

I ran it with 100 examples. I'll look into the other issues.

rinuboney / ladder

initialization value 0 should not be multiplied by tf.Variable() #1