tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.25k stars 1.1k forks source link

What the easiest way to convert a Bayesian neural network to a standard neural network? #779

Open nbro opened 4 years ago

nbro commented 4 years ago

What the easiest way (in TFP) to convert a Bayesian neural network to a standard neural network?

More precisely, I would like to build a standard neural network S where the weights of layer l are the means of the distributions of layer l of the Bayesian neural network B. I would like to do this after (or before) having trained the Bayesian NN (i.e. as soon as the Bayesian model is created).

The first solution would be to use bijectors, which I haven't yet had the opportunity to use, but I know that TFP provides bijectors that can be used to transform a distribution to another. TFP also provides a Deterministic probability distribution, which essentially represents a constant. So, the first solution would consist in having a bijector that converts each of the Gaussians to a Deterministic distribution, where the loc of the Gaussian distribution corresponds to the loc of the Deterministic distribution (and I would ignore the scale of the Gaussian distribution). After having had a quick look at the documentation, it doesn't seem that TFP already provides such a bijector, but I am probably wrong. Anyway, I don't know if it is worth having a bijector for each of the distributions of the layers, only to create a standard NN from the Bayesian one.

The second solution I thought about was to override the values of the properties kernel_posterior_tensor_fn and bias_posterior_tensor_fn that I initially pass when I construct that layer. Initially and by default, the value of these properties is a lambda function that returns a call to the sample method of the distribution d that is passed as a parameter to this lambda function (i.e. lambda d: d.sample()). So, the idea would be to override the value of these properties to be a lambda function that calls the mean() method of the distribution (rather than the sample() method), i.e. lambda d: d.mean(). However, the properties kernel_posterior_tensor_fn and bias_posterior_tensor_fn are respectively called in the methods _apply_variational_kernel and _apply_variational_bias, which are both called in the call method, which is called when the layer is first built. So, AFAIK, there is no way of overriding these properties after the model has been built.

The third solution would be just to loop over the layers property of the model and get the means of the kernel and biases, for each layer of the Bayesian NN. Then I would construct a standard NN with these means as the weights of the layers (i.e. by calling model.layers[i].set_weights(get_kernel_and_bias_means(bayesian_model.layers[i]))). However, this solution requires that I construct another model with non-Bayesian layers, which is, of course, cumbersome and tedious.

Of course, it would be nice if TFP already provides an out-of-the-box solution to my use case. It's possible I have missed it, because this problem may be common when using TFP. I am using TF 2.1 and TFP 0.9. And I am using tf.keras to construct the models (and I would like to keep using them).

nbro commented 4 years ago

@jvdillon, @davmre, @brianwa84 What is your suggestion?

To me, the fact that you cannot apparently change kernel_posterior_tensor_fn and bias_posterior_tensor_fn seems to be a big limitation, i.e. the way the actual weights are sampled during the forward pass should be configurable even after initializing and training the network. Can or not this done?

jvdillon commented 4 years ago

I recommend using tfd.Deterministic with loc=tf.Variable(....).

On Sat, Apr 18, 2020 at 12:38 PM nbro notifications@github.com wrote:

@jvdillon https://github.com/jvdillon, @davmre https://github.com/davmre, @brianwa84 https://github.com/brianwa84 What is your suggestion?

To me, the fact that you cannot apparently change kernel_posterior_tensor_fn and bias_posterior_tensor_fn seems to be a big limitation, i.e. the way the actual weights are sampled during the forward pass should be configurable even after initializing and training the network. Can or not this done?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/779#issuecomment-615933338, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIVTNUGXE2HS7RIXS7PY43RNH6S3ANCNFSM4KR6DIZQ .

nbro commented 4 years ago

@jvdillon If I use Deterministic, do I need to create a completely new model where the posteriors are deterministic and initialised with the locs of the other model? Is this your suggestion?

I would extremely appreciate if you could provide a very simple example that implements your suggestion.