tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.26k stars 1.1k forks source link

Discard steps according to "is_accepted" boolean from kernel results #1406

Closed anvvalade closed 3 years ago

anvvalade commented 3 years ago

When running an HMC (or a derived method), a boolean is_accepted is associated to each step. Is it a good practice to get read of the steps where this value is False? Or should all the results be considered independently of their is_accepted indicator?

All the tutorial seem to be using all the steps, which is a bit confusing, since a large proportion of them can be marked as "not accepted". Would it be possible to explain explicitly in one of the tutorial what this boolean is and how it should be used?

SiegeLordEx commented 3 years ago

To be clear, what is being "accepted" is a proposal for the next state in the Markov Chain. If it is rejected, then the state remains as it was before (when using Metropolis Hastings, this can be different for other kernels). Despite the proposal being rejected, the transition is still valid. In other words, it is correct to look at all the results, as they all represent the Markov Chain formed by the transition kernel.

The way you can use it is as a diagnostic: if the average acceptance rate is too low or too high, then your proposal is not well tuned and your Markov Chain isn't efficiently exploiting the Metropolis Hastings's correction. For HMC, good acceptance rate are close to 0.7-0.9.

junpenglao commented 3 years ago

Just want to reiterate that it is NOT a good practice to get rid of the rejected samples! Your MCMC chain is not ergodic any more and you will have very wrong result in general.

anvvalade commented 3 years ago

Thank you very much for your clarifications.

Similarly, do the results of a NUTS kernel with has_divergence = True have to be discarded?

junpenglao commented 3 years ago

Betancourt has an excellent write up on divergence and HMC you should read: https://betanalpha.github.io/assets/case_studies/divergences_and_bias.html

TLdr: in principle you should not trust MCMC result that contains divergence (even as few as 1 in a single chain), as it might be an indication that there is difficult geometry in the posterior space that the sampler is not able to explore efficiently.

In practice if you are confident that: divergence is due to numerical error -> you can keep all samples or divergence is due to poor adaptation of a single chain -> you can discard that single chain

(note that discarding just samples containing the divergence is not recommended)