Inquiry: Scaling the Lr

henripal / sgld

59 stars 10 forks source link

Closed rb876 closed 5 years ago

rb876 commented 5 years ago

Hi, it is not clear to me how the Lr is scaled throughout the learning.

Many Thanks

henripal commented 5 years ago

Contrary to pytorch optimizer, the SGLD optimizer takes in an optional lr as argument!

rb876 commented 5 years ago

Oh thanks, and how do you scale the learning rate with respect to the number of data points ? Looking at ChunyuanLI implementation (https://github.com/ChunyuanLI/pSGLD/issues/2) and at gmarceaucaron implementation (https://github.com/gmarceaucaron/natural-langevin-dynamics-for-neural-networks) they scale the langevin noise with respect to the number of data points in the training (Ntrain) or the square root of Ntrain to allow convergence. Do you do something similar ?

henripal commented 5 years ago

@rb876 and I discusssing this by email - closing the issue!