tweag / monad-bayes

A library for probabilistic programming in Haskell.
MIT License
404 stars 62 forks source link

Confusion over the use of pmmh #104

Closed alessio-b-zak closed 1 year ago

alessio-b-zak commented 4 years ago

I am trying to implement a small discrete time SIR model as an HMM and fit this using pmmh. I am not sure if I am using monad-bayes correctly and, if I am, I am unsure as to why the performance of pmmh on my model is so poor.

The type signature for pmmh returns as output m [[(a, Log Double)]] where a is the output of the model. My normal understanding of PMMH is that it returns samples from the posterior but, when running the implemented version, we receive lists of samples for each MH iteration, one for each particle. Upon inspection, each of these values appear to be the same which follows my understanding of PMMH. Is the information from each particle returned in order to have access to the weights of each particle or do I have a more fundamental misunderstanding?

Furthermore, my interpretation of how I am meant to use the PMMH function to fit parameter is that I am to pass a model that scores against the given data whilst evolving the underlying model, returning the parameters at the end that are reweighted by the scoring procedure. The following model does that, folding over the data, scoring each individually.

scoreEpidemicToData 
    :: MonadInfer m 
    => FixedParams 
    -> Epidemic 
    ->  LatentState 
    -> Params 
    -> m Params 
scoreEpidemicToData fixedParams ys initialState params  = do
    foldM_ (scoreEpidemicToDatum fixedParams params) initialState (unwrapEpidemic ys)
    return params

where scoreEpidemicToDatum takes the following form

scoreEpidemicToDatum 
    :: (MonadSample m, MonadCond m)  
    => FixedParams 
    -> Params 
    -> LatentState 
    -> Int 
    ->  m LatentState 
scoreEpidemicToDatum fixedParams params x datum = do
    let obs lambda y = score (poissonPdf lambda y)
    x' <- transitionModel fixedParams params x 
    obs ((fromIntegral $ inf x') * (rho params)) datum
    return x'

I am performing inference by using pmmh as follows

sampleIO $ do
    pmmhRes <- prior $ pmmh nsteps 14 nparticles paramsPrior (scoreEpidemicToData fixedParams dat initialState)

where my data is [Double] is length 14. Is this the correct way to use pmmh to perform inference over parameters? If so, is there any way to increase the performance of pmmh as it is not feasible to do inference on what is quite a simple model (sampling from binomial and poisson distributions) with enough particles or steps to have a good sense of if the samples converge?

The full code can be found here.

alessio-b-zak commented 4 years ago

I fixed my performance issue by replacing foldM_ with the direct recursion however now it appears that that the chains do not move about in the parameter space so I would be interested to know how to correctly construct a model pmmh can do inference with.

curiousleo commented 4 years ago

Thanks for the report! I'm happy to see usage of monad-bayes out there.

unwrapEpidemic

Don't unwrap it! I'd much rather see this epidemic wrapped up!

monad-bayes has not received a lot of performance optimisation to date. I hope to be able to give you a more substantial answer over the next couple of days.

alessio-b-zak commented 4 years ago

Thanks for the reply. I'm super keen to get to grips with the library and potentially contribute with some other approximate inference algorithms. As I previously said I've managed to increase the performance to make moderate sized inferences possible (1000 steps with 1000 particles) but the main issue I'm having is that my chains aren't converging to the parameters I believe they should be.

From an implementation I have done in R and C++ of the same model and PMMH (with the same priors) the chain rapidly converges to parameters that have been validated with other models however my implementation of the model in monad-bayes under pmmh leads the chains to not move anywhere. I believe that I do not fundamentally understand how to perform inference over the parameters of a model using pmmh. Any help with this issue would be greatly appreciated!

I'm going to dig into the paper and the source over the next day or two to see if I can get a grip on how the algorithm is implemented

idontgetoutmuch commented 4 years ago

This https://github.com/rossng/sir-monad/blob/master/src/Model.hs#L65 doesn't look right to me surely it should be positive as infected subjects are filling this compartment together with a component for the rate of recovery \gamma.

image
idontgetoutmuch commented 4 years ago

If it were me I would look at the distributions of the particles for every MC proposal and check they were what I expected.

alessio-b-zak commented 4 years ago

This https://github.com/rossng/sir-monad/blob/master/src/Model.hs#L65 doesn't look right to me surely it should be positive as infected subjects are filling this compartment together with a component for the rate of recovery \gamma.

image

Ah i appear to have copied and pasted the code without changing the parameters! That should read -gamma * dt. I don't think that would be the cause of the lack of convergence of the chains. Thank you for the suggestion. I will try to understand the source a bit more to see If I can extract that information.

Edit: I have fixed the code and I still see a kind of stationarity to the chain. My intuition is that I am simply returning the same parameters without having their probability scored by the observations

idontgetoutmuch commented 4 years ago

It's great you are doing this example - I am guessing the dataset is flu in the boarding school. You appear to be doing Euler with quite a coarse time step but I don't think this should be a problem. Let me know what results you get for each proposal. Perhaps we can plot the distributions somehow.

alessio-b-zak commented 4 years ago

It's great you are doing this example - I am guessing the dataset is flu in the boarding school. You appear to be doing Euler with quite a coarse time step but I don't think this should be a problem. Let me know what results you get for each proposal. Perhaps we can plot the distributions somehow.

Yes I got rid of the time step just so I could debug performance problems earlier on. It is the boarding school influenza data indeed :). I will get back on this tomorrow and report back what I find! Thanks

idontgetoutmuch commented 4 years ago

Ah I was wrong - you are not doing Euler but using the binomial to generate jumps according to the compartment model - it was after a glass of wine so I hope I may be forgiven.

I had another look and your prior for beta looks very low - I generated some fake data aka prior predictive checks and subjects were almost never getting infected. You have \Gamma(2,1) so the mode is 1.0 for your prior. IIRC the rate is more like 10.0. I suspect the infection just dies out and there is no chance of inferring any parameters.

The first step I would suggest is generating fake paths for the model with your priors. Apologies if this is grandmothers and eggs.

idontgetoutmuch commented 4 years ago

I can look again tomorrow afternoon. Perhaps you could post your working R and C++ code?

alessio-b-zak commented 4 years ago

As far as I'm aware I'm using the same priors I did in my C++ implementation. I'll link it here. I think its more likely that I am incorrectly interpreting how to use pmmh and the mh functions as I've not yet fully understood tracing or Trace MCMC - The C++ code that computes the particle filter is here, and the PMCMC implementation can be found here.

I've performed inference using other methods, (ABC and synthetic likelihood) and computer similar parameter values Beta: 2.1007173 Rho: 0.7902133 Gamma: 0.4945210

I haven't checked recently but I think most of the priors are reasonably close to these estimates and I recall seeing similar parameters derived elsewhere.

Edit: Just to reiterate, I don't currently see how the implementation of pmmh can be used to perform inference over the parameters of the model. As in the C++ code you want to simulate a particle for each of the latent states and this can only be achieved in the implementation if this is the output type of the model, however, the mh implementation returns samples according to the output type of the model. These are not the parameters so I'm unclear as to how you can obtain posterior samples of your parameter.

alessio-b-zak commented 4 years ago

I've had a bit of time to examine this and I have the impression that the pmmh algorithm should instead be

pmmh t k n param model = do
  mh t (param >>= runPopulation . pushEvidence . Pop.hoist lift . smcSystematic k n . model >>= return param)

i.e. returning the parameters at the end with the model looking like this

scoreEpidemicToDatum'
    :: MonadInfer m
    => FixedParams
    -> Epidemic
    -> LatentState
    -> Params
    -> m LatentState
scoreEpidemicToDatum' fixedParams dat  x params = do
    let obs lambda y = score (poissonPdf lambda y)
        go [] x = return (head x)
        go (y:ys) xs = do
            x' <- transitionModel fixedParams params (head xs)
            obs ((fromIntegral $ inf x') * (rho params)) y
            (go ys (x' : xs))
    (go (unwrapEpidemic dat)) [x]

In this way the particle filter is run with the latent states as the individual particles. As evidence for this, running

evidence $ smcSystematic 14 k ((return $ Params 2 2 2) >>= (scoreEpidemicToDatum' fixedParams dat initialState))

with hardcoded parameter values (here being 2) produces the same values of the pseudo-marginal as a hard-coded particle filter in C++ (albeit the unlogged versions).

However when making this change the chains still do not converge to the correct parameter values. I have used used the following priors

paramsPrior :: MonadSample m => m Params
paramsPrior = do
    pBeta <- Control.Monad.Bayes.Class.gamma 2 2
    pRho <- Control.Monad.Bayes.Class.beta 2.0 2.0
    pGamma <- Control.Monad.Bayes.Class.gamma 2.0 0.8
    return (Params pRho pBeta pGamma )

and ran pmmh with 50 particles for 1000 mh steps. I obtain the following trace and density plot for the beta parameter. I am going to take a break from working on this for a while but when I come back I will attempt to work out why the trace MCMC algorithm does not appear to behave as expected.

curiousleo commented 4 years ago

One thing I can add here is that both in Functional Programming for Modular Bayesian Inference and in Formally justified and modular Bayesian inference for probabilistic programs, @adscib described the pmmh algorithm as it is currently implemented in monad-bayes.

alessio-b-zak commented 4 years ago

@curiousleo I am aware however I think that @adscib comments within the paper add arguments in favour of this implementation. In section 4.2 of Functional Programming for Modular Bayesian Inference, the authors discuss altering their Gaussian random walk model to function with pmmh by extracting the prior sampling. They then discuss how the intention of pmmh is to perform inference over the parameters of the model. However, given the current form of the model, passing it into pmmh would return samples from the latent state of the random walk, not the parameter space as in random_walk they have returned the latent states and not the parameters.

I have however discovered that the particle-filter with the first implementation does in fact produce the same estimates of the pseudo-marginal likelihood so now I am confused. It is a shame that no example of using pmmh on parameters of a HMM exists using the current implementation of pmmh. I am going to attempt to implement a normal metropolis hastings algorithm instead of using the trace version to debug convergence.

Thank you for all your help.

alessio-b-zak commented 4 years ago

Given what I said above I decided to try and create an example on the Gaussian Random walk from the paper, changing the inference model to the following

random_walk :: MonadInfer m => [Double] -> Double -> m Double
random_walk ys s = do
    let obs x y = score ( normalPdf x 1 y)
    let expand xs [] = return xs
        expand (x : xs ) (y : ys ) = do
            x' <- normal x s
            obs x' y
            expand (x' : x : xs ) ys
    xs <- expand [0] ys
    return s

i.e. returning the parameters instead of the latent states. I then generated some data using some test parameters and ran the standard pmmh which managed to recover the parameters, unlike my alternative pmmh which uses @adscib random walk function from the paper. I took this and took @idontgetoutmuch 's advice to use different priors (the ones from my previous post) and indeed my original version does appear to output (a lot slower than the random walk) correct-ish posteriors, albeit with an extremely low acceptance rate (see below for densities and traces for beta).

Given that the current MH implementation does not allow proposals to be specified I am assuming this is the best we can do. I don't understand how the algorithm works in this form as I am under the impression that the particle filter will use the parameters as particles but I'll take it.

To sum up what I think I've deduced - my initial implementation was correct i.e. to use the pmmh algorithm you want to return the parameters at the end of the model description and. After running, prior $ pmmh ... you want to simply extract the head of the inner lists, throwing away the associated weight and that is a given sample from the posterior. I still have confusion about quite how this works but I will dig into this myself.

Sorry for the massive confusion. I will write up a small blogpost outlining how to use this soon but some messy code can be found here

idontgetoutmuch commented 4 years ago

@alessio-b-zak I am not convinced this is now using MCMC to estimate the parameters; rather it is using the particle filter. This is fine; I do it all the time myself (for modelling reasons).

I too have dug through the code to find out how the proposals are being made but I have no definitive answer as yet. In your original code I could see very big jumps suggesting that the proposals are too big. Afaics and this might be complete nonsense, proposals are not drawn from e.g. a normal but an empirical distribution (which might be an approximation to a normal).

A blog post is a great idea. I will write up some notes also.

alessio-b-zak commented 4 years ago

@idontgetoutmuch It looks like, with the trace MCMC algo described here, proposals in the monad-bayes code are made by just randomly selecting one of the random decisions made throughout the program and changing it, see here. I can see that there is a pull request open to facilitate different proposal types but I can't necessarily think of how to incorporate previous updates into the trace MCMC algorithm. There might be some more literature somewhere that can be dug into.

curiousleo commented 4 years ago

Allowing users to configure proposal distributions in a principled way is what Alex Lew did here:

alessio-b-zak commented 4 years ago

@curiousleo I have some time over summer so I will take some time to dig into this and attempt to implement it into monad-bayes

ohad commented 2 years ago

Hey everyone --- just into this with @min-nguyen: where does this stand at the moment?

idontgetoutmuch commented 2 years ago

I will take a look - I have a stand alone PMMH implementation in Haskell which I can benchmark against.

idontgetoutmuch commented 2 years ago

I have given this a bit of thought while shaving and one of the problems (already noted above) is that monad-bayes does not allow you to control the proposals - basically (@reubenharry please correct me) it takes a random path through the program (for some reason the PPL community calls these traces), picks a random point on the path and picks a value in [0, 1] and then replaces the whatever was there before (a value in [0, 1]) with this new random value. This is converted e.g. via inverse CDF to create a new random path. So I think if in your program you had something like x <- normal 0.0 1.0 then you will get a new proposal that is just drawn from this distribution.

In "classical" MCMC we would expect the proposal to be independently controllable. Mathematically this is correct but is going to be inefficient. @curiousleo linked to a paper which apparently describes how one might configure proposal distributions (disclaimer: I haven't read it). One option would be to implement this. Another and perhaps better option would be to implement HMC with NUTS. Ideally we would have both.

@ohad what would you like to happen here? I think we know the problem. I am independently writing a epidemiology example but now I have rediscovered @alessio-b-zak's code I will try and get their example working along with mine. I think that @reubenharry has an HMC PR without NUTS so I don't know if that is going to "solve" the problem here.

I am now thinking this is our top priority (well maybe first getting acceptance rates); without more efficient sampling, monad-bayes can't really be used for anything more complicated than regression (of course SMC and SMC^2 should just work).

min-nguyen commented 2 years ago

This is interesting to know, thanks. The reason this came up in conversation was because I noticed mh performed quite fine for a very simple Hidden Markov Model, but then became extremely slow (i think this situation is mainly numeric efficiency rather than statistical efficiency) for the SIR model which is just implemented as a slightly more sophisticated HMM. There are of course lots of factors at play here, so it's hard to tell if the issue lies on our end.

idontgetoutmuch commented 2 years ago

FYI: we have an intern starting in December to look at using monad-bayes for agent-based models.

reubenharry commented 2 years ago

Yeah, as @idontgetoutmuch says, the proposal to jump from one trace to the next isn't customizable atm. It's not hard to make it so, and even to make random choices nameable so you can write a proposal in a nice lensy way by picking certain named choices to change, but you'd need to supply q(oldtrace|newtrace) and q(newtrace|oldtrace) yourself, which seems unergonomic. My plan was to copy what Gen does, maybe avoiding the issue of static types by using Any, since they have a nice story about customizable MCMC.

Separately, I started out working on HMC, detailed in another issue. No particular roadblock, but getting a correct and fast implementation is a bunch of work, so we'll see how it goes.

I'd like to get around to trying out the SIR model, and trickier state-space models more generally, so hopefully I can do that soon, once I've popped a few things off the stack (lazy sampler, quicker trace execution, pretty notebooks with examples, diagrams of traces...)

idontgetoutmuch commented 2 years ago

FYI I am reading Trace Types and Denotational Semantics for Sound Programmable Inference in Probabilistic Languages: https://dl.acm.org/doi/pdf/10.1145/3371087 but my current thinking is we should go for an HMC black box approach as that seems to have been very successful for Stan. I am interested to see @reubenharry's Gen-like approach though.