pyro-ppl / numpyro

Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation to GPU/TPU/CPU.
https://num.pyro.ai
Apache License 2.0
2.09k stars 227 forks source link

Use biased autocorrelation estimate by default #1785

Open jonny-so opened 2 months ago

jonny-so commented 2 months ago

I was noticing some very erratic and unexpected behaviour from the effective_sample_size diagnostic, which was due to some extreme values in the far right tail of the autocorrelation function. This can be reproduced pretty easily by plotting the autocorrelation of a sequence of 1000 IID Gaussians.

This behaviour seems to be due to the following line https://github.com/pyro-ppl/numpyro/blob/2f1bccdba2fc7b0a6ec235ca1bd5ce2417a0635c/numpyro/diagnostics.py#L130 which I believe is there to make the estimate unbiased. Stan however uses the biased estimate of Geyer (1992) (see https://github.com/stan-dev/stan/blob/634034deb3abd6314d980c1aab083f64269f4019/src/stan/analyze/mcmc/autocovariance.hpp#L60), presumably to stop the issue of the variance in the right tail exploding.

I have local changes which (optionally) use the biased estimate. I think this should be the default, partly to be consistent with Stan, but also because it seems too erratic to be useful with the current behaviour. It seems to be an intentional departure from the Stan implementation however, so I thought best to open an issue here to discuss.

fehiepsi commented 2 months ago

Wow, this is subtle. It would be great if you could contribute a PR, @jonny-so! Could you also add some tests to illustrate that the biased estimate behaves better in some cases? Thanks!