piEsposito / blitz-bayesian-deep-learning

A simple and extensible library to create Bayesian Neural Network layers on PyTorch.
GNU General Public License v3.0
918 stars 107 forks source link

The number of parameters is doubled #89

Open sansiro77 opened 3 years ago

sansiro77 commented 3 years ago

Here is the simplest example.

fc1 = BayesianLinear(1, 1)
print(list(fc1.parameters()))
pytorch_total_params = sum(p.numel() for p in fc1.parameters() if p.requires_grad)

The output is:

[Parameter containing:
tensor([[0.0651]], requires_grad=True), Parameter containing:
tensor([[-7.1001]], requires_grad=True), Parameter containing:
tensor([-0.0429], requires_grad=True), Parameter containing:
tensor([-6.9712], requires_grad=True), Parameter containing:
tensor([[0.0651]], requires_grad=True), Parameter containing:
tensor([[-7.1001]], requires_grad=True), Parameter containing:
tensor([-0.0429], requires_grad=True), Parameter containing:
tensor([-6.9712], requires_grad=True)]
total parameters: 8

The parameters are fc1.weight_mu, fc1.weight_rho, fc1.bias_mu, fc1.bias_rho, fc1.weight_sampler.mu, fc1.weight_sampler.rho, fc1.bias_sampler.mu, fc1.bias_sampler.rho, respectively, which is double what is expected.

sansiro77 commented 3 years ago

My current solution is:

count = 0
for name, param in net.named_parameters():
    if ("sampler" not in name) and param.requires_grad:
        count += param.numel()
print(count)
Philippe-Drolet commented 3 years ago

I am also wondering what these parameters mean respectively, thank you

sansiro77 commented 3 years ago

I am also wondering what these parameters mean respectively, thank you

In Bayesian neural networks, each parameter ("weight" and "bias") is a random variable related to a distribution, which is Gaussian here. "mu" is the mean and "rho" is related to sigma by self.sigma = torch.log1p(torch.exp(self.rho)). A "sampler" samples a specific value of the distribution every time the model is calculated.

Philippe-Drolet commented 3 years ago

Thanks for the reply! I knew that but I mean more that from what I have seen with BNNs in general, the weight distribution is rarely a perfect normal dist centered at mu and sigma (it usually is more of a gaussian mixture) but here, every weight dist I obtain are like that. Is it variational inference that always gives perfect normal distributions?

sansiro77 commented 3 years ago

In the paper "Weight Uncertainty in Neural Networks", the authors applied Gaussian variational posterior and scale mixture prior.