Closed fritzo closed 5 years ago
@fritzo Personally I would say it is not worth wasting time on any SGLD-like algorithm for inference, particularly when Pyro has such nice things as normalising flows.
SGLD, despite having a million papers written about it and its variants, does not perform well as a substitute for MCMC, which is what it is sold as. See the figures in this (soon to be JMLR) paper: https://arxiv.org/pdf/1708.00955.pdf
It often gives nonsensical results compared to the true posterior (and thus in my opinion, is a terrible drop-in replacement for MCMC), but because it is essentially SGD with added noise it can given decent predictions. The examples there are sufficiently easy do not to need preconditioning, but I believe the outcome will be the same with it.
So what's to gain with this "pseudo-MCMC" method when there are already good approximate inference methods / MAP estimator in Pyro?
I think a better use of time would be spent incorporating approximate curvature VI methods, such as https://arxiv.org/pdf/1712.02390.pdf, if you are interested in more geometric inference methods.
Just my two cents. :)
for what it's worth, i generally tend to agree with @robsalomone's sentiment about these sgld-like algorithms.
another interesting curvature thingie is this paper.
@robsalomone Thanks, that's helpful.
It would be nice to have a pSGLD optimizer as in Le et al. (2015). This implementation of MCMC is a nice drop-in replacement for a PyTorch optimizer (e.g. for Adam). pSGLD is also compatible with data subsampling, unlike HMC.
Cf existing implementations in Tensorflow probability, PyTorch/pysgmcmc, and MXNet.