henripal / sgld

59 stars 10 forks source link

Question about SGLD optimizer #1

Closed goldkim92 closed 5 years ago

goldkim92 commented 6 years ago

Hi! It was impressive to see your code, especially using parent class (Optimizer) to make SGLD class in sgld_optimizer.py .

However I'm wondering if adding noise with gradient is a little bit wrong. According to the SGLD paper (and also in your ICML Workshop paper), it seems that

langevin_noise = Normal(torch.zeros(size), torch.ones(size) / group['lr'])
p.data.add_(-group['lr'], d_p + langevin_noise.sample().cuda())

based on the fact that lr * (Sample from N(0,1/lr)) is equal to Sample fron N(0,lr)

JavierAntoran commented 5 years ago

I have the same question as goldkim92. Additionally, in the original SGLD paper, they consider the effect of the prior through the gradient of log p(\theta). I believe your implementation ignores this term as it discards weight decay.

henripal commented 5 years ago

Sorry - hadn't looked at this in a while! I spend a little time and I think you're definitely right - fixed in 60bd5646a2f25885b1b1f70f5db4ecc4d6481b26. I'm still taking the sqrt of the term you suggest in implementation as pytorch takes a scale parameter and not a variance parameter.

henripal commented 5 years ago

Also @JavierAntoran - yes, I intentionally discard the prior term.

JavierAntoran commented 5 years ago

Also @JavierAntoran - yes, I intentionally discard the prior term.

Thanks for the quick reply. Could you explain why you make this decision?

henripal commented 5 years ago

Ha, maybe github issue not the best place to talk about this but:

JavierAntoran commented 5 years ago

Hmm, I'm not sure I'm following your reasoning.

I would like to read your thesis once it is published. We can continue this conversation via email if you prefer. You can find me at: javier(dot)a(dot)es(at)ieee(dot)org

henripal commented 5 years ago

Agree partially! Emailing you and closing this issue - thanks all!