Open nbro opened 4 years ago
@jvdillon, @davmre, @brianwa84 What is your suggestion?
To me, the fact that you cannot apparently change kernel_posterior_tensor_fn
and bias_posterior_tensor_fn
seems to be a big limitation, i.e. the way the actual weights are sampled during the forward pass should be configurable even after initializing and training the network. Can or not this done?
I recommend using tfd.Deterministic
with loc=tf.Variable(....)
.
On Sat, Apr 18, 2020 at 12:38 PM nbro notifications@github.com wrote:
@jvdillon https://github.com/jvdillon, @davmre https://github.com/davmre, @brianwa84 https://github.com/brianwa84 What is your suggestion?
To me, the fact that you cannot apparently change kernel_posterior_tensor_fn and bias_posterior_tensor_fn seems to be a big limitation, i.e. the way the actual weights are sampled during the forward pass should be configurable even after initializing and training the network. Can or not this done?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/779#issuecomment-615933338, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIVTNUGXE2HS7RIXS7PY43RNH6S3ANCNFSM4KR6DIZQ .
@jvdillon If I use Deterministic, do I need to create a completely new model where the posteriors are deterministic and initialised with the locs of the other model? Is this your suggestion?
I would extremely appreciate if you could provide a very simple example that implements your suggestion.
What the easiest way (in TFP) to convert a Bayesian neural network to a standard neural network?
More precisely, I would like to build a standard neural network
S
where the weights of layerl
are the means of the distributions of layerl
of the Bayesian neural networkB
. I would like to do this after (or before) having trained the Bayesian NN (i.e. as soon as the Bayesian model is created).The first solution would be to use bijectors, which I haven't yet had the opportunity to use, but I know that TFP provides bijectors that can be used to transform a distribution to another. TFP also provides a
Deterministic
probability distribution, which essentially represents a constant. So, the first solution would consist in having a bijector that converts each of the Gaussians to a Deterministic distribution, where the loc of the Gaussian distribution corresponds to the loc of the Deterministic distribution (and I would ignore the scale of the Gaussian distribution). After having had a quick look at the documentation, it doesn't seem that TFP already provides such a bijector, but I am probably wrong. Anyway, I don't know if it is worth having a bijector for each of the distributions of the layers, only to create a standard NN from the Bayesian one.The second solution I thought about was to override the values of the properties
kernel_posterior_tensor_fn
andbias_posterior_tensor_fn
that I initially pass when I construct that layer. Initially and by default, the value of these properties is a lambda function that returns a call to thesample
method of the distributiond
that is passed as a parameter to this lambda function (i.e.lambda d: d.sample()
). So, the idea would be to override the value of these properties to be a lambda function that calls themean()
method of the distribution (rather than thesample()
method), i.e.lambda d: d.mean()
. However, the propertieskernel_posterior_tensor_fn
andbias_posterior_tensor_fn
are respectively called in the methods_apply_variational_kernel
and_apply_variational_bias
, which are both called in thecall
method, which is called when the layer is first built. So, AFAIK, there is no way of overriding these properties after the model has been built.The third solution would be just to loop over the
layers
property of the model and get the means of the kernel and biases, for each layer of the Bayesian NN. Then I would construct a standard NN with these means as the weights of the layers (i.e. by callingmodel.layers[i].set_weights(get_kernel_and_bias_means(bayesian_model.layers[i]))
). However, this solution requires that I construct another model with non-Bayesian layers, which is, of course, cumbersome and tedious.Of course, it would be nice if TFP already provides an out-of-the-box solution to my use case. It's possible I have missed it, because this problem may be common when using TFP. I am using TF 2.1 and TFP 0.9. And I am using
tf.keras
to construct the models (and I would like to keep using them).