DPMM and SMC inference with linear scaling

marcoct commented 7 years ago

First, do the version that scales O(n) per generation where n is the number of data points (resulting in an O(n^2) Gibbs sweep).

marcoct commented 7 years ago

Started on the O(N^2) version in https://github.com/probcomp/gen-examples/commit/da36f4b92e0df7bba15b54630ca40ea4e329c5ab

marcoct commented 7 years ago

notes from meeting today with vikash:

write the naive generative version
write the collapsed version that stores in the trace the sufficient stats, that are mutated
exchangeability
the inference meta-programmer needs to know that there is
if you know those things, then the programmer knows that the program is equivalent to the naive version
gen does not know those equivalences, the programmer needs to knows
misbehaving primitive that happens to mutate the trace -- you can violate the simpler functional contract, you just have to know what you're doing. don't worry about the safe infrastructure for doing that.

marcoct commented 7 years ago

Here is an implementation of a DPMM with a Gibbs sweep inference algorithm that scales in O(N) per sweep. It makes use of sufficient statistics for CRP and cluster components that are stored in the trace.

https://github.com/probcomp/gen-examples/blob/c851ba157ae01c9b7e40b8dc69fb7700260aa01a/dirichlet-process-mixture-model/dpmm.jl

It runs, but has not been tested for correctness or inference quality, etc.

An alternative approach, that may be implemented as an example of module networks, would instead define a single multivariate exchangeable module that generates a vector of N data points, with a sub-trace that contains the vector of data and the sufficient statistics. Then, the incorporate! and unincorporate! could possibly be hidden from the inference programmer, who would interact with the module using a version of the standard trace interface.

marcoct commented 7 years ago

Wrote a notebook with DPMM with Gibbs sampling O(N) per sweep and a custom D3 trace rendering including a transition animation.

Notebook on Github: https://github.com/probcomp/gen-examples/blob/master/dirichlet-process-mixture-model/dpmm.ipynb
Notebook rendered as PDF: https://drive.google.com/open?id=0B_amg0-n5yelbTlwS0N6Y0pWRE0

Next step is to write the SMC inference program.

This effort brought to light a few ideas for how to formalize this inference programming model:

Traces should usually be fully constrained, except when being acted upon by an inference operator. Each inference operator is responsible for unconstraining the values that it mutates and re-constraining them afterwards. This is a useful convention because some inference operators may make use of @generate, which will replace values for unconstrained (or un-intervened) random choices. Each inference operator should only need to know about the parts of the trace that are relevant to it, and should not have to worry about ensuring the rest of the trace is constrained.
Constraining (or intervening) on a value of a trace can cause the trace to become non-coherent. This popped up when implementing the DPMM because constraining the data does not cause the sufficient statistics to be updated in this implementation. This seems error prone, as discussed in https://github.com/probcomp/gen-examples/issues/8
This inference operator has to duplicate knowledge that is already stored in the trace (i.e. the identity of the random primitives used in the program and the sufficient statistics storage) . In the future, an inference compiler could construct this inference program from the original probabilistic program and a specification of the inference operator's template.

marcoct commented 7 years ago

A CRP joint generator with a custom trace type that exposes the cluster assignments as addressable values, and internally maintains sufficient statistics, was added here: https://github.com/probcomp/Gen.jl/blob/b51105619075571f757a4af7350e8bd977d36bbc/src/primitives/crp_joint_generator.jl

The joint generator may or may not be used for this DPMM example. The joint generator is more desirable than explicit draw and incorporate! sequence in the probabilistic program because the sufficient statistics are naturally and automatically associated with the CRP generator. In the sequential version, the sufficient statistics are manually recorded with a separate tag directive, and the inference programmer needs to make this association themselves.

marcoct commented 6 years ago

Added generic serial basic SMC and conditional SMC algorithms for state-space models without rejuvenation to Gen in branch https://github.com/probcomp/Gen.jl/blob/20170804-marcoct-generators/src/inference/state_space_smc.jl

Tested these with particle marginal MH and particle Gibbs on a toy model from the particle MCMC paper in https://github.com/probcomp/gen-examples/tree/20170810-marcoct-generators/particle-mcmc

probcomp / GenExperimental.jl

DPMM and SMC inference with linear scaling #20