Stiffness and MH limitations -- HMC?

nilsbecker commented 8 months ago

This is more of a long-term feature request. Sorry for rambling. TL;DR: dagger needs HMC.

The LMH algorithm works by sampling individual elementary stochastic events from their respective priors, then making Metropolis-Hastings steps for the whole program trace, based on these proposals and on additional scoring with a likelihood. There is a fundamental problem with this procedure, namely that the likelihood score will often depend sensitively on a few of the elementary events in problems of practical interest. This means that proposals from the prior for any such event will rarely get accepted, resulting in poor sampling. E.g. if there are 1000 elementary events in a stochastic program, and a particular interesting but pathological event has only 1% acceptance rate, we'd need 100_000 attempts to have a new sample for that event.

One example for this situation is given by the configurations of a 2D polymer model with rigid links and relatively stiff elastic joints between them. The end point of the chain then depends sensitively on the initial direction. Any likelihood constraining the endpoint to a small region in the plane will pose a hard problem for LMH. A whole subfield of computational polymer physics has been tackling this kind of problem for decades, resulting in highly refined and effective purpose-built MC move sets.

A general-purpose sampling procedure that can automatically discover highly correlated collective updates of elementary events that achieve a high acceptance probability is a formidable challenge. To my knowledge the only strategy that stands a chance is Hamiltonian Monte Carlo (I'd love to be corrected if this is mistaken). HMC uses likelihood derivative information to produce relevant samples by analogy with classical mechanics. HMC is implemented in a number of probabilistic programming frameworks including Stan, PyMC and Turing.jl.

To be competitive in difficult sampling problems, prbnmc-dagger would need to take steps to implement HMC. It's unclear to me how to obtain derivatives though. Owl appears to be stagnating and may be at risk of obsolescence. Potentially, the upcoming Ocannl will be an option for generating likelihood derivatives via algorithmic differentiation.

nilsbecker commented 8 months ago

if there is interest i could share a toy implementation of the 2d polymer using lmh.

igarnier commented 8 months ago

Yes, tough examples are always nice! If you submit a PR here I'll integrate it into the examples directory. I agree with you: lmh has the advantage that it can be applied to any situation but on the other hand, it's not very good at staying in zones of high probability. Rejection rates become super high on anything remotely nontrivial.

I would really love to have an HMC implementation in dagger. As you point out, in order to compute gradients we'd need a robust implementation of algorithmic differentiation in OCaml. I tried to come up with my own implementation but it's tough to develop something robust and have state-of-the-art performance on my free time... I'd rather not depend on Owl because its too big and not well maintained. OCannl looks nice, I didn't try it yet!

nilsbecker commented 8 months ago

It would super nice if there were a go-to AD library in Ocaml. Sigh. At some point I think @mmottl had something but that was never open sourced AFAIK. Ocannl is supposed to be (alpha) released soon, we'll see if that will be viable. I sure hope so!

nilsbecker commented 8 months ago

also, i found this: https://github.com/mbarbin/micrograd . just for reference. probably not enough?

igarnier / prbnmcn-dagger

Stiffness and MH limitations -- HMC? #18