probcomp / bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
http://probcomp.csail.mit.edu/software/bayesdb
Apache License 2.0
923 stars 63 forks source link

Implement MML inference extensions #571

Closed fsaad closed 6 years ago

fsaad commented 7 years ago

Required MML expressions:

where <subproblem> specifies something like a subtrace of the CrossCat model program trace (following the DP chapter), and at minimum makes the following latent variables addressable:

INFER <subproblem> does some inference on <subproblem> (maybe a Gibbs scan), and ASSIGN sets the value to a given JSON literal. By default both operations mutate all models.

Non-trivial questions:

Candidate solutions:

fsaad commented 6 years ago

https://github.com/probcomp/bayeslite/commit/e4138e18303a345ffc7d76b47c5541747b163d3e

fsaad commented 6 years ago

Specifying deterministic constraints

ALTER ANALYSIS SCHEMA <s> [ANALYSES (<indexes>)]

    ENSURE <variables..> DEPENDENT
    ENSURE <variables...> INDEPENDENT
    ENSURE <variables...> IN CONTEXT OF <variable>
    ENSURE <variables...> IN SINGLETON CONTEXT

    ENSURE ROWS <rows...> IN CLUSTER OF <row> WITHIN CONTEXT OF <variable>
    ENSURE ROWS <rows...> IN SINGLETON CLUSTER WITHIN CONTEXT OF <variable>    

    SET CONTEXT CONCENTRATION PARAMETER TO <value>
    SET ROW CLUSTERING CONCENTRATION WITHIN CONTEXT OF <variable> TO <value>

Stochastic mutation via Gibbs sampling

ANALYZE <s> [ANALYSES (<indexes>)] FOR <n> ITERATIONS|SECONDS WAIT (

    VARIABLES <variables...>
    ROWS <rows...>

    SUBPROBLEMS (
        VARIABLE HYPERPARAMETERS,
        VARIABLE CLUSTERING,
        VARIABLE CLUSTERING CONCENTRATION,
        ROW CLUSTERING,
        ROW CLUSTERING CONCENTRATION,
    )

    [OPTIMIZED];
)

The tokens <variables...> and <rows...> are either variable/row names, or * to indicate all. Combinations of the above ALTER and SUBPROBLEMS are sufficient to recover several models that CrossCat generalizes, such as single/multi-row cluster models as well as fully-dependent and fully-independent variables.