Implement MML inference extensions

fsaad commented 7 years ago

Required MML expressions:

ASSIGN <subproblem> TO <json-literal-value | PRIOR SAMPLE> [FOR MODEL <i>]
INFER <subproblem> [FOR MODEL <i>]

where <subproblem> specifies something like a subtrace of the CrossCat model program trace (following the DP chapter), and at minimum makes the following latent variables addressable:

column-concentration -> real,
column-hyperparameters -> array of arrays/scalars,
column hyperparameters[c] -> array or scalar
view-assignments -> array of discrete
view-assignment[c] -> discrete
clustering-concentration[v] -> real
cluster-assignments[v] -> array of discrete
cluster-assignments[v][r] -> discrete

INFER <subproblem> does some inference on <subproblem> (maybe a Gibbs scan), and ASSIGN sets the value to a given JSON literal. By default both operations mutate all models.

Non-trivial questions:

If all operations are going to mutate all models, how can we guarantee that that the view/cluster assignments can actually apply to all models? Each model has a different set of indexes.
The end-user should also be able to view a JSON representation of each state, otherwise they will not know what indexes to use for reassignment.

Candidate solutions:

Add ability to query a subproblem, which returns a JSON blob: QUERY <subproblem> [FOR MODEL <i>]
Extend each addressable variable with an optional model index, and go best-effort.

fsaad commented 6 years ago

https://github.com/probcomp/bayeslite/commit/e4138e18303a345ffc7d76b47c5541747b163d3e

fsaad commented 6 years ago

Specifying deterministic constraints

ALTER ANALYSIS SCHEMA <s> [ANALYSES (<indexes>)]

    ENSURE <variables..> DEPENDENT
    ENSURE <variables...> INDEPENDENT
    ENSURE <variables...> IN CONTEXT OF <variable>
    ENSURE <variables...> IN SINGLETON CONTEXT

    ENSURE ROWS <rows...> IN CLUSTER OF <row> WITHIN CONTEXT OF <variable>
    ENSURE ROWS <rows...> IN SINGLETON CLUSTER WITHIN CONTEXT OF <variable>    

    SET CONTEXT CONCENTRATION PARAMETER TO <value>
    SET ROW CLUSTERING CONCENTRATION WITHIN CONTEXT OF <variable> TO <value>

Stochastic mutation via Gibbs sampling

ANALYZE <s> [ANALYSES (<indexes>)] FOR <n> ITERATIONS|SECONDS WAIT (

    VARIABLES <variables...>
    ROWS <rows...>

    SUBPROBLEMS (
        VARIABLE HYPERPARAMETERS,
        VARIABLE CLUSTERING,
        VARIABLE CLUSTERING CONCENTRATION,
        ROW CLUSTERING,
        ROW CLUSTERING CONCENTRATION,
    )

    [OPTIMIZED];
)

The tokens <variables...> and <rows...> are either variable/row names, or * to indicate all. Combinations of the above ALTER and SUBPROBLEMS are sufficient to recover several models that CrossCat generalizes, such as single/multi-row cluster models as well as fully-dependent and fully-independent variables.

probcomp / bayeslite

Implement MML inference extensions #571

Specifying deterministic constraints

Stochastic mutation via Gibbs sampling