[FR] Preconditioned SGLD optimizer

fritzo commented 5 years ago

It would be nice to have a pSGLD optimizer as in Le et al. (2015). This implementation of MCMC is a nice drop-in replacement for a PyTorch optimizer (e.g. for Adam). pSGLD is also compatible with data subsampling, unlike HMC.

Cf existing implementations in Tensorflow probability, PyTorch/pysgmcmc, and MXNet.

robsalomone commented 5 years ago

@fritzo Personally I would say it is not worth wasting time on any SGLD-like algorithm for inference, particularly when Pyro has such nice things as normalising flows.

SGLD, despite having a million papers written about it and its variants, does not perform well as a substitute for MCMC, which is what it is sold as. See the figures in this (soon to be JMLR) paper: https://arxiv.org/pdf/1708.00955.pdf

It often gives nonsensical results compared to the true posterior (and thus in my opinion, is a terrible drop-in replacement for MCMC), but because it is essentially SGD with added noise it can given decent predictions. The examples there are sufficiently easy do not to need preconditioning, but I believe the outcome will be the same with it.

So what's to gain with this "pseudo-MCMC" method when there are already good approximate inference methods / MAP estimator in Pyro?

I think a better use of time would be spent incorporating approximate curvature VI methods, such as https://arxiv.org/pdf/1712.02390.pdf, if you are interested in more geometric inference methods.

Just my two cents. :)

martinjankowiak commented 5 years ago

for what it's worth, i generally tend to agree with @robsalomone's sentiment about these sgld-like algorithms.

another interesting curvature thingie is this paper.

fritzo commented 5 years ago

@robsalomone Thanks, that's helpful.

pyro-ppl / pyro

[FR] Preconditioned SGLD optimizer #1921