Larger error for MNIST MLP baseline supervised compared to paper's results

JLC7 commented 8 years ago

Hello,

I did not find if anyone tried this TF version of the ladder network fully supervised to compare with the paper's baseline MLP result from table 1 so I tested it for that case.

I am running into the problem of getting a larger error for running the script entirely supervised (i.e. loss = corrupted supervised cost only or setting denoising costs to all zeros) with num_labels = 60000. I tried two different Adam optimizer settings: (1) default as provided in this TF implementation, with LR decay starting at epoch 15 and (2) matching the parameters to the paper's provided code, with LR decay starting at epoch 100, as seen here. After running for 150 epochs, setting 1 performed better in this case (error decreases to ~10-15% early on but doesn't improve after that); for setting 2, error is larger throughout and generally >90% after 150 epochs. In the paper they got an error of 0.80 (± 0.03)% so I am a ways off and not sure where the problem could be. I also tried removing all parts of the code dealing with unlabeled portion as that is not used here, but no luck there. Any ideas on where the problem may be would be helpful.

Thanks!

ThinkLock commented 8 years ago

@lJLC7 same problem~ it seems that the TF is not suitable for the problem? the similar results I got in cifar10, for fully supervised learn with 4000 examples, the accuracy can only achieve 72% to 73% .

JLC7 commented 8 years ago

Looking into this more, there is also some discrepancy when compared with paper's results for semi-supervised case. In the relevant lines of code, I set Adam optimizer as follows:

starter_learning_rate = 0.002
decay_after = 100 
train_step = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1 = 0.1, beta2 = 0.001, epsilon = 1e-8).minimize(loss)

to match the papers training for 100 epochs with LR = .002 and 50 epochs with LR decay and block's default Adam settings.

Here are averaged results for three models I replicated (extended it to the conv models from the paper including max and global mean pooling, error bars are +/- stdev):

Ladder, full, 100 labels
Conv-fc, 100 labels
Conv-small-gamma model, 100 labels

100_ladder_full

100_conv_fc

100_conv_gamma_small

I ran a couple runs with the default Adam settings in the TF script (LR 0.02, decay_after 15) and the accuracies seem to be better; with model 1 converging to ~1.2% error and model 2 to ~1%. Model 3 seems to be about the same in this case, but can't say with high confidence as I only ran twice.

Possible issues may be some difference in Adam implementations in blocks vs TF. I also just noticed in the TF script here if it should instead be z_est_bn = (z_est[l] - m) / tf.sqrt(v+1e-10)? Not sure if that will make a big difference. Maybe it would be of some help to do some visualization in tensorboard.

Relatively at least, conv models are performing better than just fully connected which matches their results. At this point, I'll probably start moving away from benchmarking with MNIST and onto trying it out with the data I am working with. Any insights would be helpful!

JLC7 commented 8 years ago

I ran more trials with Adam at LR of 0.02 and decay after 15 and default TF setting. Here are results for the three models: mnist_ladder_full_100 mnist_conv_fc_100 mnist_conv_small_gamma_100 The convnets match pretty close. The ladder-full model is still a little off. There is however, still the issue with large error for supervised case, which I am not sure where the issue is.

yqweixiang commented 7 years ago

Does anyone ever monitor the cifar10's testing error? I did it with Tensorflow, and got the testing error curve as : .

The accuracy will reach 78% with 20 epoch's training.

rinuboney / ladder

Larger error for MNIST MLP baseline supervised compared to paper's results #9