tweag / monad-bayes

A library for probabilistic programming in Haskell.
MIT License
407 stars 62 forks source link

Resampling a Population does not normalize #241

Open turion opened 1 year ago

turion commented 1 year ago

Describe the bug When applying a resampler, the total evidence stays the same.

To Reproduce

ghci> import Control.Monad.Bayes.Population
ghci> import Control.Monad.Bayes.Sampler.Strict
ghci> let model = spawn 1000 >> normal 0 1 >>= (condition  . (>= 0))
ghci> sampleIOfixed  $ evidence model
0.48900000000000027
ghci> sampleIOfixed  $ evidence $ resampleStratified model
0.48900000000000027

Expected behavior

I'm not sure whether this is intended. It seems like it, given this line: https://github.com/tweag/monad-bayes/blob/110e040099d7f1dc1c5f0bd6170fed759897a5b6/src/Control/Monad/Bayes/Population.hs#L116

Extra care is taken to make sure that the total mass from before (z) is restored, and no accidental normalization is introduced.

There used to be a function normalize, which was removed in 771ce2ece83f0de7cc0d592bec6dd6e660024ddc by @reubenharry, as part of https://github.com/tweag/monad-bayes/pull/142. It was part of the 0.1.1.0 release: https://hackage.haskell.org/package/monad-bayes-0.1.1.0/docs/Control-Monad-Bayes-Population.html#v:normalize

Its haddocks read:

Normalizes the weights in the population so that their sum is 1. This transformation introduces bias.

As of master, there are other ways to "normalize" the distribution, but they all require MonadFactor on the inner monad. This is sometimes not feasible. For example when doing particle filtering (see https://github.com/turion/dunai-bayes/), one needs to condition and resample on every step of the time series. Then the probability masses of the individual particles drop at an exponential rate, no matter how much we condition in the inner monad. Very soon, they shrink below 2^(-64) and cannot be resolved by a Double anymore. In these situations, I would have believed that it makes sense to normalize, so the numbers stay in a region we can calculate with.

What I don't understand are the comments about "bias". In 1d5521840e9081603bc760621af3d9bc2685da9c it is commented by @adscib that normalize indroduces bias, and I don't understand what that means. In particular I don't understand whether normalizing at every time step in a particle filter then also introduces bias. If yes, how do other implementations of particle filters deal with this situation?

reubenharry commented 1 year ago

On a statistical level, I also don't understand this comment. I have no objection to putting this function back. Standard caveat: @adscib definitely knew what he was doing when writing this, so I would strongly recommend against any change to e.g. resampleGeneric without total certainty that it won't change the semantics. My guess is that the reason normalization in resampleGeneric doesn't happen is because it needs to be a measure preserving transformation in order for various algorithms to be correct, (e.g. maybe PMMH and RMSMC).

reubenharry commented 1 year ago

I think I removed it because it wasn't used anywhere in the codebase, which wasn't a very sensible decision.