Open marcoct opened 7 years ago
Started on the O(N^2) version in https://github.com/probcomp/gen-examples/commit/da36f4b92e0df7bba15b54630ca40ea4e329c5ab
notes from meeting today with vikash:
Here is an implementation of a DPMM with a Gibbs sweep inference algorithm that scales in O(N) per sweep. It makes use of sufficient statistics for CRP and cluster components that are stored in the trace.
It runs, but has not been tested for correctness or inference quality, etc.
An alternative approach, that may be implemented as an example of module networks, would instead define a single multivariate exchangeable module that generates a vector of N data points, with a sub-trace that contains the vector of data and the sufficient statistics. Then, the incorporate! and unincorporate! could possibly be hidden from the inference programmer, who would interact with the module using a version of the standard trace interface.
Wrote a notebook with DPMM with Gibbs sampling O(N) per sweep and a custom D3 trace rendering including a transition animation.
Next step is to write the SMC inference program.
This effort brought to light a few ideas for how to formalize this inference programming model:
Traces should usually be fully constrained, except when being acted upon by an inference operator. Each inference operator is responsible for unconstraining the values that it mutates and re-constraining them afterwards. This is a useful convention because some inference operators may make use of @generate
, which will replace values for unconstrained (or un-intervened) random choices. Each inference operator should only need to know about the parts of the trace that are relevant to it, and should not have to worry about ensuring the rest of the trace is constrained.
Constraining (or intervening) on a value of a trace can cause the trace to become non-coherent. This popped up when implementing the DPMM because constraining the data does not cause the sufficient statistics to be updated in this implementation. This seems error prone, as discussed in https://github.com/probcomp/gen-examples/issues/8
This inference operator has to duplicate knowledge that is already stored in the trace (i.e. the identity of the random primitives used in the program and the sufficient statistics storage) . In the future, an inference compiler could construct this inference program from the original probabilistic program and a specification of the inference operator's template.
A CRP joint generator with a custom trace type that exposes the cluster assignments as addressable values, and internally maintains sufficient statistics, was added here: https://github.com/probcomp/Gen.jl/blob/b51105619075571f757a4af7350e8bd977d36bbc/src/primitives/crp_joint_generator.jl
The joint generator may or may not be used for this DPMM example. The joint generator is more desirable than explicit draw and incorporate! sequence in the probabilistic program because the sufficient statistics are naturally and automatically associated with the CRP generator. In the sequential version, the sufficient statistics are manually recorded with a separate tag
directive, and the inference programmer needs to make this association themselves.
Added generic serial basic SMC and conditional SMC algorithms for state-space models without rejuvenation to Gen in branch https://github.com/probcomp/Gen.jl/blob/20170804-marcoct-generators/src/inference/state_space_smc.jl
Tested these with particle marginal MH and particle Gibbs on a toy model from the particle MCMC paper in https://github.com/probcomp/gen-examples/tree/20170810-marcoct-generators/particle-mcmc
First, do the version that scales O(n) per generation where n is the number of data points (resulting in an O(n^2) Gibbs sweep).