kumar-shridhar / PyTorch-BayesianCNN

Bayesian Convolutional Neural Network with Variational Inference based on Bayes by Backprop in PyTorch.
MIT License
1.42k stars 323 forks source link

about q_logvar_init and p_logvar_init #27

Closed ShellingFord221 closed 4 years ago

ShellingFord221 commented 5 years ago

Hi, in your new code, q_logvar_init and p_logvar_init are parameters in BBBconv2d (in your old code, they have fixed values). What are their initial values? I didn't find them in your code. Thanks!

kumar-shridhar commented 5 years ago

Hi @ShellingFord221,

Values are defined inside the Bayesian AlexNet architecture. image

The path is PyTorch-BayesianCNN/ImageRecognition/utils/BayesianModels/BayesianAlexNet.py`

ShellingFord221 commented 5 years ago

Hi, in your codes, you define the fixed value of mean and variance of the real distribution of weight, i.e. w_p~N(0, 0.05), and the value of mean and variance of the approximate distribution of weight w_q are parameters (they are initialized by stdv and q_logvar_init). But why are parameters of the real distribution fixed? If we know the real distribution, why should we approximate it by q, rather than just use these values?

kumar-shridhar commented 5 years ago

Hi, We need to define a boundary (or starting point) to contain the distributions in that domain. Otherwise, the distributions might vary a lot and it is hard to make them converge. So we start with something and then use LRT to calculate the change. And we try to learn these both and approximate it. We take a 0 mean and a predefined variance to start with. Variance can be made learnable but 0 mean Normal distribution is a good start to apply LRT efficiently. However, a different distribution can also be used and we can learn the parameters. But it will be way too much to learn considering the complexity of CNNs already and in our experiments, it becomes hard to converge. Hope it makes sense.

ShellingFord221 commented 4 years ago

Hi, I wonder that will the initial values of p_logvar and q_logvar affect the performance of BCNN? Why or why not? Thanks.

ShellingFord221 commented 4 years ago

And why do you set the same initial value for real distribution and approximate distribution? Shouldn't they be different? Thanks!

kikyo97 commented 3 years ago

Hi, in your codes, you define the fixed value of mean and variance of the real distribution of weight, i.e. w_p~N(0, 0.05), and the value of mean and variance of the approximate distribution of weight w_q are parameters (they are initialized by stdv and q_logvar_init). But why are parameters of the real distribution fixed? If we know the real distribution, why should we approximate it by q, rather than just use these values?

I am also perplexed by this question. In the BBBConv.py, prior_mu and prior_sigma are all fixed. This means that KL Loss will lead to the W_mu and W_sigma to the prior value (i.e., prior_mu and prior_sigma). However, how can we know that the prior value is suitable for our tasks? After all, in the CNN, back propagation can help us optimize the weight.