pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.75k stars 2.02k forks source link

SMC improve its efficiency by using samples from all (high temperature) stages #2519

Closed aloctavodia closed 5 years ago

aloctavodia commented 7 years ago

I have been using the new Sequential Monte Carlo sampler for some very hard to sample multidimensional distributions with great success, thanks @hvasbath for that!

Currently the SMC sampler only uses the samples from the lowest temperature (latest iteration). I know there are way to recycle/use samples from the highest temperatures. One reference I have found is this one https://arxiv.org/abs/1504.05753. Before trying to implement anything I wonder if someone has other references to share or thoughts about this point.

seanlaw commented 7 years ago

@aloctavodia Can you provide a quick explanation of SMC?

In the case of temperature replica exchange (a la molecular dynamics simulations), there are multiple replicas of the same system being simulated at various temperatures and where neighboring temperatures would occasionally swap as long as some metropolis criteria is met. This method was popularized by Sugita and Okamoto.

Note that one is not only confined to exchanges in temperature space and, in fact, one could also exchange in Hamiltonian space.

While exchanges between neighboring replicas is one aspect of the sampling, the other key is to be able to use the data coming from ALL temperatures (or Hamiltonians) and not only derive your probability distribution strictly from the lowest temperature. In order to accomplish this, one simply needs to use a method called Weighted Histogram Analysis Method (WHAM). And, at the limit of zero-bin widths, WHAM is also often referred to by it's equivalent name of Multistate Bennet's Acceptance Ratio (MBAR).

A well known Python implementation of MBAR by Shirts and Chodera can be found here.

And an open source C++ implementation of (zero-bin width) WHAM can be found here. Full disclosure, I wrote this code but no longer support it as it's been several years since I've last touched it.

junpenglao commented 7 years ago

@seanlaw The current SMC algorithm implemented in PyMC3 is from this paper:

Minson, S. E. and Simons, M. and Beck, J. L., (2013), Bayesian inversion for finite fault earthquake source modeI I - Theory and algorithm. Geophysical Journal International, 2013, 194(3), pp.1701-1726

cc @hvasbath

aloctavodia commented 7 years ago

Hi @seanlaw, nice to see you here! and thanks for you input, I will check those links and see if they are applicable to SMC. Now that you mention I remember using WHAM years ago in the protein-folding context, but it was a black-box for me :)

In SMC you start at a high temperature and then you move sequentially to lower temperatures. A SMC is somehow similar to a genetic algorithm.

  1. We initialize a population (independent points in the sample space).
  2. We mutate them (run a short MCMC for each point)
  3. For the last sample of each run we perform selection by weighting them according to their posterior probability, for the next stage points with higher weights will be used to initialize more runs than points with lower weights (the populations is fixed, so some points in the previous stage will not be used to initialize any new run).
  4. We repeat from 2 until we reach the lowest temperature.

When I said posterior probability I am referring to a tempered posterior.

$p(\theta | y, \beta_m) = p(y | \theta)^{\beta_m} p (\theta)$

where $\beta_m$ is the temperature-like parameter and goes from 0 (just the prior) to 1 (full posterior).

smc

I hope this helps

seanlaw commented 7 years ago

The visual helped. I think I have a better conceptual idea of how SMC works. In relation to our offline conversation, can you tell me what you mean by "how to recycle high temperature samples"?

When I say "use all samples from all temperatures" using replica exchange, I am referring to how to reconstruct the (posterior?) distribution after all sampling is complete and that would be via WHAM. I'm not sure I am using the same terminology so you'll have to forgive my lack of Bayesian thinking.

aloctavodia commented 7 years ago

I don't know if "recycling" is the better term. But I am also talking about using all samples from all temperatures.

hvasbath commented 7 years ago

@aloctavodia thanks a lot for starting this! I am glad it works so well for you! Where did you find this nice figure? Did you prepare it yourself? I think we shouldnt use all samples from all temperatures but rather, selective samples from all temperatures?

@junpenglao Actually the current implementation is a mixture of the one you mentioned and this: http://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9399%282007%29133%3A7%28816%29

But the original paper where it has been developed is this: http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2006.00553.x/abstract

The other ones more or less introduce it to the engineering and geoscience communities, respectively, the math is basically the same. Just for citation purposes we should give credit to the original developers...

aloctavodia commented 7 years ago

@hvasbath I got the figure from Google Images. Is my intention to prepare one myself mainly for an internal group meeting talk (I could send it to you when ready), I am thinking now that we should probably have a notebook describing the SMC sampler :)

We can use all samples, if they are weighted properly, to estimate any desired quantity. Still the details are elusive to me, I hope things become clear when I read about WHAM and take a second look at https://arxiv.org/abs/1504.05753

hvasbath commented 7 years ago

I started describing the SMC sampler in a condensed version for a paper I am writing. We could use this as a base and change it otherwise we will get into troubles with copyright later on, if eventually my article gets published ...

Good point @aloctavodia ! Unfortunately, I wont have time to look into it until the mentioned article is in good shape, but maybe October I could have a look as well...

hvasbath commented 7 years ago

Another paper which might be worth noting and implementing: https://arxiv.org/abs/1103.3965

Is about efficient sampling with as few chains as possible for high-dimensional problems... Maybe the other points work toward the same direction ... For now the number of chains has to be very high compared to number of variables in order to get a valid posterior...

junpenglao commented 7 years ago

Fixed in #2563

hvasbath commented 7 years ago

I think this is not off the table and #2563 is rather unrelated or am I missing something?

aloctavodia commented 7 years ago

I agree with @hvasbath this is not off table (at least not yet). But I understand @junpenglao confusion. Let me try to clarify this, the computation of the marginal likelihood in #2563 use important weights from all stages. Those same weights could be used to get (weighted) samples from all stages/temperatures, but at the moment we are not doing that, instead we are just returning all the samples from the last stage. It is not entirely clear to me which strategy is the best, intuitively I see pros/cons for both of them. Anyway, I could start working on this issue in the next few days, if it is ok to you guys.

junpenglao commented 7 years ago

I see, thanks for the explanation!