Closed goldkim92 closed 5 years ago
I have the same question as goldkim92. Additionally, in the original SGLD paper, they consider the effect of the prior through the gradient of log p(\theta). I believe your implementation ignores this term as it discards weight decay.
Sorry - hadn't looked at this in a while! I spend a little time and I think you're definitely right - fixed in 60bd5646a2f25885b1b1f70f5db4ecc4d6481b26. I'm still taking the sqrt of the term you suggest in implementation as pytorch takes a scale parameter and not a variance parameter.
Also @JavierAntoran - yes, I intentionally discard the prior term.
Also @JavierAntoran - yes, I intentionally discard the prior term.
Thanks for the quick reply. Could you explain why you make this decision?
Ha, maybe github issue not the best place to talk about this but:
Hmm, I'm not sure I'm following your reasoning.
As I understand it, the goal of SGLD is to eventually draw samples from the posterior over the weights. We can then do Bayesian inference by integrating out the weights of the NN. In order for there to be a posterior distribution over weights, there needs to be a prior distribution. Otherwise, you are just doing noisy maximum likelihood?
Something like a Gaussian prior will penalise large weights. Intuitively, this will affect the shape of the loss surface as weight configurations which explain the data well but contain large magnitude weights will become less attractive.
I would like to read your thesis once it is published. We can continue this conversation via email if you prefer. You can find me at: javier(dot)a(dot)es(at)ieee(dot)org
Agree partially! Emailing you and closing this issue - thanks all!
Hi! It was impressive to see your code, especially using parent class (Optimizer) to make SGLD class in sgld_optimizer.py .
However I'm wondering if adding noise with gradient is a little bit wrong. According to the SGLD paper (and also in your ICML Workshop paper), it seems that
based on the fact that
lr * (Sample from N(0,1/lr))
is equal toSample fron N(0,lr)