tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.27k stars 1.1k forks source link

Bayesian Neural network cifar10 example not converging #271

Open Raj-DIAL opened 5 years ago

Raj-DIAL commented 5 years ago

I tried cifar10_bnn example with bayesian_vgg as an example architecture with default hyper-parameters. However, the loss seems to increase and model results are not consistent. I ran the bayesian_neural_network example of MNIST, which works well. Also, I ran a small network and that seems to work as well. I think something wrong in the models implementation of bayesian_vgg or am I missing something.

Raj-DIAL commented 5 years ago

An example run:

Step: 800 Loss: 36.070 Accuracy: 0.270 KL: 34.035
Step: 900 Loss: 40.010 Accuracy: 0.279 KL: 38.245
Step: 1000 Loss: 44.503 Accuracy: 0.288 KL: 42.445
Step: 1100 Loss: 48.428 Accuracy: 0.295 KL: 46.632
 ... Held-out nats: -2.318
 ... Validation Accuracy: 0.376
Step: 1200 Loss: 52.387 Accuracy: 0.302 KL: 50.807
Step: 1300 Loss: 56.643 Accuracy: 0.311 KL: 54.970
Step: 1400 Loss: 60.890 Accuracy: 0.318 KL: 59.120
Step: 1500 Loss: 64.937 Accuracy: 0.325 KL: 63.257
 ... Held-out nats: -2.260
 ... Validation Accuracy: 0.402
Step: 1600 Loss: 69.104 Accuracy: 0.332 KL: 67.381
Step: 1700 Loss: 72.996 Accuracy: 0.339 KL: 71.492
Step: 1800 Loss: 76.954 Accuracy: 0.347 KL: 75.590
Step: 1900 Loss: 81.383 Accuracy: 0.353 KL: 79.674
 ... Held-out nats: -2.303
 ... Validation Accuracy: 0.429
SiegeLordEx commented 5 years ago

I believe you're just seeing the effect of KL annealing. The example, by default, anneals the KL term from 0 to 1 over 50 epochs. 50 epochs is 50 * 50000/128 = 19500 steps, so I wouldn't expect the loss to decrease at all until the annealing is done. You can certainly turn off the annealing (kl_annealing flag), but if you do the network might not reach a final set of parameters that are as good as with the annealing in place.