tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.23k stars 1.09k forks source link

Mistake in KL divergence computation of bayesian neural network #246

Closed ghost closed 5 years ago

ghost commented 5 years ago

In the file probability/tensorflow_probability/examples/bayesian_neural_network.py , on line 263, the KL divergence is computed as the sum of the losses of each layer: kl = sum(neural_net.losses) / mnist_data.train.num_examples.

However the KL divergence does not equal to the sum of the losses of the weights. Elsewhere in the TensorFlow_probability documentation the KL divergence is implemented with the KL divergence inbuilt function: tf.distributions.kl_divergence for example in here so the probability/tensorflow_probability/examples/bayesian_neural_network.py file should be updates with the tf probability distributions function of the KL divergence as well.

SiegeLordEx commented 5 years ago

That example is written using TFP's variational Bayesian layers, and for those layers (by default) the losses are in fact computed using tfp.distributions.kl_divergence: https://github.com/tensorflow/probability/blob/50f38f3b0184c1a441921fb7826d9c76672ca44f/tensorflow_probability/python/layers/dense_variational.py#L297-L308

And here's where the default is set:

https://github.com/tensorflow/probability/blob/50f38f3b0184c1a441921fb7826d9c76672ca44f/tensorflow_probability/python/layers/dense_variational.py#L101

ghost commented 5 years ago

Thanks! Perhaps it should be more explicit about this in the documentation. I assumed losses referred to the difference in the norms of the weights before and after the updates of the network rather than the KL divergence between the weight distributions before and after the networks.