Bayesian neural networks with non gaussian priors

deepakdalakoti commented 4 years ago

Hi,

I am trying to use tensorflow probability to learn a Bayesian neural networks. I want to learn the responses y_t based on input features x_t, i.e.

y_t = f(x_t) + eps

where f(x_t) is the output of the neural network and eps models aleatoric uncertainty. As a first step, I assume all weights have a Gaussian prior with zero mean and unit variance while eps is modelled as a zero mean and unit variance noise. This can be achieved using the following examples taken from https://colab.research.google.com/github/tensorchiefs/dl_book/blob/master/chapter_08/nb_ch08_03.ipynb

kernel_divergence_fn=lambda q, p, _: tfp.distributions.kl_divergence(q, p) / (x.shape[0] * 1.0)
bias_divergence_fn=lambda q, p, _: tfp.distributions.kl_divergence(q, p) / (x.shape[0] * 1.0)

def NLL(y, distr): 
  return -distr.log_prob(y) 

def normal_sp(params): 
  return tfd.Normal(loc=params[:,0:1], scale=1.0)

inputs = Input(shape=(10,))

hidden = tfp.layers.DenseFlipout(10,bias_posterior_fn=tfp.layers.util.default_mean_field_normal_fn(),
                           bias_prior_fn=tfp.layers.default_multivariate_normal_fn,
                           kernel_divergence_fn=kernel_divergence_fn,
                           bias_divergence_fn=bias_divergence_fn,activation="relu")(inputs)
hidden = tfp.layers.DenseFlipout(5,bias_posterior_fn=tfp.layers.util.default_mean_field_normal_fn(),
                           bias_prior_fn=tfp.layers.default_multivariate_normal_fn,
                           kernel_divergence_fn=kernel_divergence_fn,
                           bias_divergence_fn=bias_divergence_fn,activation="relu")(hidden)
params = tfp.layers.DenseFlipout(1,bias_posterior_fn=tfp.layers.util.default_mean_field_normal_fn(),
                           bias_prior_fn=tfp.layers.default_multivariate_normal_fn,
                           kernel_divergence_fn=kernel_divergence_fn,
                           bias_divergence_fn=bias_divergence_fn)(hidden)
dist = tfp.layers.DistributionLambda(normal_sp)(params) 

model_vi = Model(inputs=inputs, outputs=dist)
model_vi.compile(Adam(learning_rate=0.0002), loss=NLL)

I can then train this network. I, however, want a gamma prior for the variance of the noise parameter eps, i.e.

eps ~ N(0,sigma) sigma ~ Gamma(a1,b1)

How can I implement this in the TensorFlow probability framework? I think I need to add another neuron on the last DenseFlipout layer and change the prior and posterior functions to a function that samples from a product of a normal and gamma distribution. However not sure exactly how to implement this.

simeoncarstens commented 4 years ago

I'm no expert, but if I understand correctly, it looks to me as if you could use tfp.distributions.JointDistributionSequential for this. It seems to implement the product rule, and given that your statistical model is p(mu, sigma)=p(mu|sigma) x p(sigma) with the two probabilities given by a normal distribution and a Gamma distribution, this would be exactly what you need. So all you would need to to is define your model as something like

def statistical_model(params): 
  mu, sigma = tf.split(params, 2, axis=1)
  joint = tfd.JointDistributionSequential([
                    tfd.Normal(loc=mu, scale=sigma),
      lambda sigma: tfd.Gamma(concentration=a1, rate=b1)
  ])
  return joint

instead of normal_sp(params). I haven't really tried this, though, and I'm not sure whether tf.split() is the right idea here. /edit: I think my brain was some place else when I wrote this. Fixed several errors in what I wrote...

kaijennissen commented 3 years ago

@deepakdalakoti Did you manage to use a hierarchical prior? If yes, I would be interested in how you did this.

tensorflow / probability

Bayesian neural networks with non gaussian priors #1038