Closed mikowals closed 8 years ago
Thanks! I've been trying to figure out the problem. I also found another minor difference in the batch normalization of the decoder. I've pushed the code. But, 98.65% after 10 epochs? I get something more close to 91%. Did you make any other changes?
I was running with num_labeled
at either 50k or 60k examples to try and hit the 0.58% error rate reported in the paper. With the large number of labels convergence is much faster.
The end result after changing the initialization was ~0.70% error rate and I also noticed something funny in the update_batch_norm. After fixing that I was able to hit 0.57% error rate so I think the rest of the code works. I will file separate issues for that a few other small fixes.
I ran it with 100 examples. I'll look into the other issues.
Thanks very much for sharing your work. I was struggling to implement a version of this myself.
This code, however, doesn't match the outcomes in the paper and I think it is because of initializations like:
I think the in built tensorflow optimizations can only update the trainable tf.Variables so by multiplying the tf.Variable by 0 (effectively removing them from the optimization) a very different model is being fit.
You can see this by trying to duplicate the paper results by using the
g_gauss
function in the decoder and seeing that thed_cost
outputs never change.I think a fix is to move
inits
into the initialization value when defining a new variable. This way the value ofinits
will only impact the beginning value. Like this:Though training is still running the
d_cost
values are updating and accuracy is much higher - 98.65% after 10 epochs.