tweag / monad-bayes

A library for probabilistic programming in Haskell.
MIT License
400 stars 62 forks source link

Hamiltonian Monte Carlo #144

Open reubenharry opened 2 years ago

reubenharry commented 2 years ago

Hamiltonian Monte Carlo (HMC) is a widely used, gradient based, MCMC algorithm, that is the backbone of Stan's inference. I plan to implement it for monad-bayes. Todos (checkboxes indicate things done on branch):

Optional:

I have the basic pieces working on branch, but the checkboxes indicate further steps. Current roadblocks:

adamConnerSax commented 2 years ago

Hi! I've been watching all this progress on monad-bayes with great interest! I use Stan a lot, using Haskell to manage the data and write the Stan code and then hs-cmdstan and CmdStan to run the models and then Haskell again to process and analyze the results. Right now the Stan code generation is...not pretty. But I'm working on a rewrite using a typed expression tree and type-level tracking of indexing into vectors, matrices and arrays all of which sits in an AST of Stan statements. I think this will make things much cleaner and easier on the Haskell side and perhaps make room for Haskell-side optimization (CSE, vectorization, consolidating or splitting into more for loops). That work so far is here. Anyway, as I've watched the monad-bayes work, I've been wondering if it would be interesting to be able to take some subset of things specified in monad-bayes and write equivalent Stan code to perform HMC. This is more and less interesting if monad-bayes can do the HMC. More interesting for testing, but less interesting for actual computation. Although Stan is pretty optimized so it might also be a useful back-end for slower models? I have no idea how complicated this might be but I'd be interested in, at the very least, figuring that out. Selfishly, it might help me do a better job on the Stan-writing code since right now I am the only consumer and that doesn't always make for good design...

idontgetoutmuch commented 2 years ago

@adamConnerSax I guess you know about this? http://www.cs.nott.ac.uk/~psznhn/Publications/stancon2018.pdf

adamConnerSax commented 2 years ago

I hadn't! Nice. Is that work ongoing?

I agree with the paper that keeping the post-processing bits and the model in sync can be tricky, but if you want to utilize the actual structure of the posterior you need to have Stan do it's "generated quantities" thing (or do something similar yourself with the output of each draw). I just generate many Stan files with various generated quantities blocks and run them using the "generate quantities" mode of CmdStan. So I only run the model once but get all the generated stuff. There's some book-keeping but I don't find it so onerous. Which I guess is why I am trying to make it easy to write Stan from Haskell.

Edit: Ah. I see I misunderstood. That library keeps all the Stan samples to use for prediction or whatever. That makes more sense. I can see how that would be useful and maybe easier and faster than using Stan's generated quantities block.

reubenharry commented 2 years ago

Hi! Sorry for the slow response. Yes, it sounds interesting to compile monad-bayes into Stan. I can imagine that taking the Free monad representation of a probabilistic program and writing it as Stan might actually be possible. Happy to chat more about that. The HMC in monad-bayes is going to be a reference implementation (i.e. much slower than Stan, most likely), at least until Haskell has a better autodiff story - which might be a while. If you have more thoughts about use cases, or things you'd like to be able to do in monad-bayes, do let me know :)

Edit: your code looks cool! How operational is it? What would need to be done to connect it with monad-bayes?

adamConnerSax commented 2 years ago

No worries about the time! This is all on my slow track anyway.

The code I pointed you to is my newer version of an old thing which is much less strongly typed but is operational. I'm a couple weeks away (slow-track!) from the new version taking over from the old.

Then I'd need to do a lot of cleanup in the part that handles running via cmdstan. The new part is just building the Stan code from a Haskell AST. But then I have a bunch of code for managing data, checking if code has changed and re-running only what's necessary, etc. Not sure how much of that would be worthwhile or if it might be better to just have a translator as a first step?

What I've not thought about at all is how the interface with monad-bayes would work. Maybe when I come up for air after switching to the new expression/statement AST, I'll look at monad-bayes more and see if I can figure out what that might look like?

I think monad-bayes is more general (part of why I'd love to see them connected, so I could write things to use all the tools monad-bayes makes available but have Stan for speed when the model allows).

And I've no idea how to think about the organizational aspects: the way Stan breaks things into generated quantities, etc. Either there will be something beautiful just allowing me to know what's in the model and what's a generated quantity--this feels vaguely possible since maybe the "model" is only things which affect the "target" log-likelihood and anything which is required by them. But that might all be hard to manage without some sort of annotation-layer or something to guide the translation.

But I don't know yet! I am looking forward to thinking about it.

reubenharry commented 2 years ago

Sounds great! Feel free to ask questions if at any point (once you come up for air) you're wanting to understand monad-bayes better. Also check out https://monad-bayes.netlify.app/ , which goes through the API and implementation of monad-bayes fairly extensively.

One thing I'm keen to try with monad-bayes is to actually write some non-trivial models with real world data and see where the speed bottlenecks are, so this would be a good incentive.

turion commented 1 year ago

This would be a really cool addition! Let me know if you need some help!

many distributions are defined using inverse cdfs from the statistics package, which use concrete type Double. I need versions that are in terms of arbitrary Num n => n, for autodiff to work.

Tbh that sounds like a fork is needed. At least a fork of the parts that contain the distributions.

adamConnerSax commented 1 year ago

Just checking in. I still want to try to produce Stan code from monad-bayes somehow. My Stan AST work is fairly operational though pretty sparse in terms of supporting all the math functions and distributions that Stan has. But that is easy enough to expand.

What is stopping me is that I'm not at all sure if the way monad-bayes represents the model is straightforward enough to "translate" to Stan. My best guess is that I would need to develop a free-monad sort of thing which has a MonadSample instance and then interpret that free structure into Stan-code.

But a few things are immediately tricky. E.g., I don't think MonadCond makes sense in Stan since Stan is, effectively, scoring the paths. But a hard condition is a constraint in Stan, I think. So some of MonadCond might need to be present? But I'm not sure how to manage that.

Anyway, I'm starting to think about it! If anyone has concrete ideas for how to get the Stan sort of items from a monad-bayes model, I'd love to hear them. Once I have the data sources, parameters (and transformations) and the model (for priors and data), making that into Stan code should be straightforward.

reubenharry commented 1 year ago

@turion Thanks! You can see the somewhat messy branch with the WIP attached to this PR: https://github.com/tweag/monad-bayes/pull/171 You're probably right that a fork would be good. What I did was to use multiparam typeclasses for MonadSample and MonadCond, and then laboriously change to various instances. It wasn't entirely trivial, but I think it roughly works. Look in Control.Monad.Bayes.Class to see the new version of the MonadSample and MonadCond typeclasses, and Control.Monad.Bayes.Traced.Diff for the progress on HMC.

@adamConnerSax So there's currently a free monad instance of MonadSample defined in Control.Monad.Bayes.Free (soon to be Control.Monad.Bayes.Density.Free). Maybe one could work from that, or defined a Stan instance for MonadSample directly? As for MonadCond, it also amounts to scoring the paths in monad-bayes, but hard conditioning is not done with a separate mechanism.

To make this more concrete, if you have an example of a simple Stan AST corresponding to e.g.

program = do
  x <- beta 1 1
  factor (normalPdf 0 1 x)
  return x

that might be helpful to see, to understand how to do the translation.

turion commented 1 year ago

Funny! I've just recently started to toy around with a Haskell frontend to Stan. Uploaded it to github now: https://github.com/turion/mc-stan. An internet search for anything like that didn't turn up much, because unfortunately there already is a tool called stan in the Haskell ecosystem, otherwise I would have tried to reuse existing code. But that can still happen.

I think it's not possible to produce Stan code from much of monad-bayes. You can implement a monad to capture a Stan model syntactically as you describe. But already for the probability distributions you don't want to use semantic functions (as imported in monad-bayes from the statistics package), but they should stay purely syntactic, so they can be translated to Stan functions. This is troublesome already if you want to make a MonadSample instance. The type of uniform is MonadSample m => Double -> Double -> m Double, with the Haskell Double type. How would you interpret this code:

f :: Double -> Double
f = get creative here

do
  x <- random
  let y = f x
  uniform y $ y + 1

There is no chance to translate an arbitrary function Double -> Double to Stan code, short of writing a Haskell compiler. The only ways I know around this are either a free Applicative with a Coyoneda transform, which limits the expressiveness a lot, or restricting the type signatures to some symbolic (like I've drafted in mc-stan) or existential type (Floating a => a, as @reubenharry is wishing for as well). But since MonadSample pins down Double, there is no chance.

I think Double is in general a poor choice, because it prevents any other kind of real number implementation, like Cauchy sequences, or symbolic computation. Two possible ways around this would be if either the MonadSample class had an associated type replacing Double, or if all Doubles would be replaced by Floating a => a (possibly a different type class, I'm not sure).

turion commented 1 year ago

Ah sorry, didn't see your comment while writing this novella :laughing:

What I did was to use multiparam typeclasses for MonadSample and MonadCond, and then laboriously change to various instances. It wasn't entirely trivial, but I think it roughly works. Look in Control.Monad.Bayes.Class to see the new version of the MonadSample and MonadCond typeclasses

Yes, great! I think I would have recommend an associated type instead because the monad will usually determine the type of real numbers, but your version forces every monad to be polymorphic in it. If you do this:

class (RealFloat (Real m)) => MonadSample m where
  type Real m :: *
  random :: m (Real m)
  -- ...

instance RealFloat n => MonadSample (Enumerator n) where
  type Real (Enumerator n) = n
  -- ...

You can be more expressive that way, because you could also define specific instances like:

instance MonadSample IO where
  type Real = Double
  uniform = someUniformIOImplementation

Plus, you don't break all your users' polymorphic code of type MonadSample m => ....

Maybe one could work from that, or defined a Stan instance for MonadSample directly?

I think the performance of Stan will only be unlocked if one translates the distributions symbolically, so there's no way around mirroring the Stan syntax to some extent.

adamConnerSax commented 1 year ago

@turion Yep, I also think it's hard to translate monad-bayes to Stan as is. But maybe there could be a more abstract way of specifying models which could then be implemented as current monad-bayes or Stan? Like a Bayesian-network sort of thing?

BTW, in your Haskell frontend, you mention prior work. One of those is mine and you are welcome to use it if it's useful. A more current version is here: https://github.com/adamConnerSax/haskell-stan. I like the AST building parts and I am less convinced that the parts which manage data and the interface with cmdstan are ideal. But YMMV. Let me know if I can make parts of it more useful!

reubenharry commented 1 year ago

@turion Ah, this is an awesome suggestion, and it seems like the way to go. I'll try it out soon. There's an old branch of monad-bayes that does something similar, but not quite the same (see here: https://github.com/tweag/monad-bayes/tree/autodiff), and that branch had actually even implemented HMC. But I found it difficult to understand, and the types were complicated, so I'm inclined to just start anew and bring in pieces when needed.

By the way, I noticed from your website that you've done a lot of FRP (e.g. dunai). This is a bit of a tangent, so maybe I can switch it to a different issue, but do you have thoughts about implementing real-time inference (as in https://arxiv.org/abs/1908.07563) using Yampa/dunai? The idea of modelling a stochastic process or real-time inference using FRP seems fitting.

@adamConnerSax I agree that it would be nice to have a separate way of building bayes nets and MRFs that reifies the structure, and which is then convertable into either monad-bayes or Stan. Something like just declaring a graph structure explicitly. I don't in principle see what the roadblock would be there. The only reason I hadn't pursued it before is that I wasn't entirely sure what was an appropriate graph library in Haskell.

dataopt commented 1 year ago

@reubenharry I have wanted to get a better handle on algebraic-graphs. There are benchmarking results (from 2018) for a number of Haskell graph libraries.

reubenharry commented 1 year ago

@dataopt Thanks, algebraic-graphs looks like it would be good for the job.

@turion Your suggested refactor worked great! WIP branch is here: https://github.com/tweag/monad-bayes/pull/177

turion commented 1 year ago

By the way, I noticed from your website that you've done a lot of FRP (e.g. dunai). This is a bit of a tangent, so maybe I can switch it to a different issue, but do you have thoughts about implementing real-time inference (as in https://arxiv.org/abs/1908.07563) using Yampa/dunai? The idea of modelling a stochastic process or real-time inference using FRP seems fitting.

Hah, you've guessed my end game! :laughing: Yes, FRP (with sideeffects!) and stochastic processes are a perfect match, and the ProbZelus people are exactly right in pursuing this path. (It seems that it is my fate to always reimplement in Haskell everything that Marc Pouzet works on.) I'll open a separate issue if you want to discuss this and share some ideas.

dataopt commented 1 year ago

@turion The topic sounds interesting! I would love to follow along a discussion on it.

reubenharry commented 1 year ago

@turion Oh, excellent! Yes, please do :) I am also very interested in this, and would love to help.

idontgetoutmuch commented 1 year ago

@turion I am sure you are aware that you need to be a bit careful with FRP - for example, I think the package I used was using RK4 so that if you e.g. model planetary motion then energy is not conserved and the planets either spiral into the sun or fly off into the void. Maybe what you are working on allows the user to have more control over the numerical method?

reubenharry commented 1 year ago

@idontgetoutmuch To clarify, I believe this is FRP not for symplectic integration or anything to do with HMC, but for real-time inference, along the lines of https://arxiv.org/abs/1908.07563 . But yes, let's discuss on a separate issue

idontgetoutmuch commented 1 year ago

Is there a branch for any of the HMC work anywhere? @mknorps and I are going to take a look at this but we don't want to re-invent any wheels.

dschrempf commented 1 year ago

Hi!

I wanted to chime in here, not sure if it is the right place. I was trying to use monad-bayes in the past, and I loved the separation of probabilistic inference and the declaration of the model. However, the probabilistic inference part was too hidden, and it was impossible to actually infer things, because either the computation took too long, or the out-of-the-shelf methods did not succeed in estimating the complicated covariance structure of the posterior.

What I am trying to say: Even if you implement HMC, it is important to provide abilities to the user to set the leapfrog parameters, and, even better, to do this automatically. (Auto tuning; but there are parameters specifying the way auto tuning is happening). Even when using the NUTS sampler, (auto) tuning is absolutely mandatory.

My question is: Can tuning be implemented into monad-bayes? Will it be possible to define "specialized" proposals, such as random walk proposals with a very specific covariance structure? In my opinion, this is absolutely necessary if one wants to infer parameters of real world problems. I would really love to see these features implemented, but I know it is a lot of work.

I am specifically interested, because I am working on mcmc, which is an MCMC sampler completely implemented in Haskell. I provide all these functions (HMC, NUTS, auto tuning, etc.), but I see the merits of a PPL, and it would be awesome if we get a PPL with andvanced features in Haskell!

Thanks!

PS: Anything with ad is slow (as you said). The problem is that it does not really work on non-boxed vectors. Specialization on Double improved runtime a lot, but still, meh. "Specialized" random walk proposals are amazing though, but they require manual intervention... WIth "specialized" I mean that they are informed about the covariance structure of the posterior. Not sure if I explain myself!

EDIT: One more thing because I read it in the main post of this thread: Riemannian HMC seems to be harder than what we thought, I do not think it is impossible to implement, but so far, nobody has done that convincingly (at least to my knowledge).

reubenharry commented 1 year ago

Hi @dschrempf, great to hear from you! What you're saying makes a lot of sense, and I agree. To my knowledge, Gen is the probabilistic programming language with the most support for customizable MCMC proposals for arbitrary probabilistic programs. I think the best approach for monad-bayes in the long run is to borrow from Gen, and if I had more time, that's what I'd work on (see e.g. https://github.com/psg-mit/probzelus-haskell). In theory, having customizable proposals is easy to do in monad-bayes, but automating the user experience to make writing ill-formed proposals impossible is hard.

If you're interested in trying to incorporate some of the mcmc kernels into monad-bayes, I can definitely point you to where in the code the MCMC step is actually implemented in monad-bayes and explain how it works.

Re. HMC, there is some work on applying it to distributions with stochastic control flow, but in general it assumes a fixed model, so its applicability to monad-bayes rests on that assumption.

idontgetoutmuch commented 1 year ago

NaiveSampling.pdf

See here for the source: https://github.com/tweag/monad-bayes/tree/Generalising-the-sampler. You can run the entire document by

nix-shell shell.nix
ghci -isrc
:l NaiveSampling.lhs

You can generate the document by

nix-shell shell.nix
lhs2tex NaiveSampling.lhs > NaiveSampling.tex
pdflatex NaiveSampling.tex

You might need to do bibtex naiveSampling also.

idontgetoutmuch commented 1 year ago

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods Akihiko Nishimura, David Dunson, Jianfeng Lu

reubenharry commented 1 year ago

Thanks for writing this up! Extremely useful for me :)

idontgetoutmuch commented 1 year ago

Thanks for writing this up! Extremely useful for me :)

I am not sure what you are referring to - if it's my notes, please be very careful, I now think it contains some misunderstandings.

reubenharry commented 1 year ago

It's your notes, yes. I haven't started reading yet, and will hold off / be careful, but what I meant is that I've been wanting to learn about this new generalized perspective on MCMC for a while, so I envision this might be useful in future.

idontgetoutmuch commented 1 year ago

My current plan is to use the code here: https://github.com/fzaiser/nonparametric-hmc as a template.

My Attempts

https://github.com/fzaiser/nonparametric-hmc/issues/4

https://github.com/cmaarkol/nonparametric-mh/issues/1

Some References

Nonparametric Hamiltonian Monte Carlo Nonparametric Hamiltonian Monte Carlo Carol Mak Fabian Zaiser Luke Ong https://arxiv.org/pdf/2106.10238.pdf

Automating Involutive MCMC using Probabilistic and Differentiable Programming Marco Cusumano-Towner Alexander K. Lew Vikash K. Mansinghka https://arxiv.org/pdf/2007.09871.pdf

Involutive MCMC: a Unifying Framework Kirill Neklyudov Max Welling Evgenii Egorov Dmitry Vetrov https://arxiv.org/pdf/2006.16653.pdf

LazyPPL: laziness and types in non-parametric probabilistic programs Hugo Paquet Sam Staton https://openreview.net/pdf?id=yHox9OyegeX https://bitbucket.org/lazyppl/lazyppl-src/commits/

https://github.com/idontgetoutmuch/nonparametric-mh https://github.com/fzaiser/nonparametric-hmc/issues/4

LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models http://proceedings.mlr.press/v89/zhou19b/zhou19b.pdf https://github.com/bayesianbrad/PyLFPPL

Probably Tangential

Reactive Probabilistic Programming https://arxiv.org/pdf/1908.07563.pdf

Affine Monads and Lazy Structures for Bayesian Programming SWARAJ DASH, YOUNESSE KADDAR, HUGO PAQUET SAM STATON https://arxiv.org/pdf/2212.07250.pdf

Delayed Sampling and Automatic Rao–Blackwellization of Probabilistic Programs https://arxiv.org/pdf/1708.07787.pdf

Compiling Universal Probabilistic Programming Languages with Efficient Parallel Sequential Monte Carlo Inference https://arxiv.org/pdf/2112.00364.pdf