probcomp / bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
http://probcomp.csail.mit.edu/software/bayesdb
Apache License 2.0
921 stars 63 forks source link

design and implement foreign predictors #12

Closed riastradh-probcomp closed 8 years ago

riastradh-probcomp commented 9 years ago

Still not clear on what these are other than metamodels with only INITIALIZE/ANALYZE and PREDICT/SIMULATE, not MUTUAL INFORMATION or anything else like that.

Need some illustrative examples to generalize from.

vkmvkmvkmvkm commented 9 years ago

BQL SNIPPET

;;;;; load plugins for custom models

.load-custom-model linreg.py "LinearRegression"
.load-custom-model logreg.py “LogisticRegression”

;;;;; set up dependencies

UPDATE SCHEMA FOR t MODEL foo AS CUSTOM MODEL "LogisticRegression" WITH INPUTS bar, baz

UPDATE SCHEMA FOR t MODEL quux AS CUSTOM MODEL “LinearRegression” WITH INPUTS foo, bar, baz

;;;;; illustration of an intentional limitation on model composition: can’t have cycles

UPDATE SCHEMA FOR t MODEL bar AS CUSTOM MODEL “LinearRegression” WITH INPUTS foo, quux
===> ERROR: can’t have cyclic dependencies. Ignoring!

EXPLANATION

In principle this means that e.g.:

Right now all we need are “good enough” approximations to the optimal Bayesian thing. Ultimately we will also want to support the full Bayesian interface; it will actually be tractable sometimes.

A GOOD-ENOUGH APPROXIMATION FOR A LAUNCH

IMPLEMENTATION IDEAS

Ultimately it would be great to treat custom models as custom meta-models. I currently think this requires:

I had naively assumed this was too much work for a first launch, but it would be great if it was the right strategy from the beginning. Until we have an example plugin meta-model, nobody will believe that meta-models really are an open set, and people will think BayesDB is just “crosscat + some plugins”.

For reference, here’s my internal model for generators (and my starting point for the relevant math in the paper):

creating a generator

using a generator

(*) if we want all independencies, including those that are not implied by the model structure but happen to be true because of the model parameters, we need to estimate MI (which reduces to simulate and predictive) and check for 0s.

relationship to SPs

There is a mapping between meta-models and higher-order Venture SPs:

(make-my-custom-meta-model) => (observe-row simulate-row predictive-row structural-dependence)

where observe-row, etc, are all SPs that share a single latent state that stores all the parameters of the meta-model.

This could make it easy to prototype & test new meta-models in Venture. If you want to discuss this further I’d be delighted to. It isn’t necessary pre-launch but is likely to be a core feature soon afterwards.

It also will help pin down a real, efficient version of a “foreign SP interface”, which we may decide to pin down this summer. If we have it, we can then unify a great deal of testing & profiling infrastructure, and also have a canonical library of “probability distributions” and “fast inference primitives”, etc.

fsaad commented 9 years ago

(updated comment with formatting for easier readability)

fsaad commented 9 years ago

I am going to assume that custom model, custom generator, and foreign predictor all mean the same thing.

Question 1: What is the interface that the CUSTOM MODEL Regression exposes?

The latest metamodels.pdf document refers to this object as CUSTOM GENERATOR Regression rather than CUSTOM MODEL. which is suggestive that we have two options:

Since Regression is a generator that can be learned from data, it makes sense for it to live in the default metamodel. I can imagine a Bayesian regression in which approximate probabilistic inference (hence analyze) makes sense for Regression, but for an OLS regression analyze will converge in one step (assuming no missing regressors / ie no imputations) and further inference will not improve inference (unless we insert new observations).

fsaad commented 8 years ago

Under development in bdbcontrib/src/foreign.

tibbetts commented 8 years ago

This wants to come back to trunk post gpm refactor.

fsaad commented 8 years ago

@tibbetts the branch has been merged in bdbcontrib, should I close this issue and set a separate one for 'migrate foreign predictor to trunk'?

tibbetts commented 8 years ago

Just close this.