probcomp / bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
http://probcomp.csail.mit.edu/software/bayesdb
Apache License 2.0
922 stars 64 forks source link

Elevate "generators" and "populations" to the same level in BQL #469

Open fsaad opened 8 years ago

fsaad commented 8 years ago

Consider

CREATE POPULATION p FOR t(
    a NUMERICAL,
    b NUMERICAL,
);

CREATE GENERATOR g0 FOR p USING cgpm(
);

CREATE GENERATOR g1 FOR p USING cgpm(
    a  LOGNORMAL

    LATENT c CATEGORICAL,      
    MODEL b, c GIVEN a USING baz;
);

Currently the three BQL queries can be written:

ESTIMATE PROBABILITY OF a=2 FROM p;
ESTIMATE PROBABILITY OF a=2 FROM p MODELED BY g0;
ESTIMATE PROBABILITY OF a=2 FROM p MODELED BY g1;

Conceptually all of g0, g1 and p are the same "type" of probabilistic object. Because p averages over g0 and g1 for all BQL queries it implements the GPM interface. This aggregation is irrespective of the fact that g1 contains additional model for a LATENT c. Therefore, suggest the following simplification.

ESTIMATE PROBABILITY OF a=2 FROM p;
ESTIMATE PROBABILITY OF a=2 FROM g0;
ESTIMATE PROBABILITY OF a=2 FROM g1;

My prediction is that the majority of MML programs we will be writing for now are not going to be aggregating over different metamodels for the same population, so having the additional MODELED BY clauses is redundant.