Why is a given row atypical?

probcomp / bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.

http://probcomp.csail.mit.edu/software/bayesdb

Apache License 2.0

923 stars 63 forks source link

Why is a given row atypical? #77

Open gregory-marton opened 9 years ago

gregory-marton commented 9 years ago

I'd love to have a language feature that, after I find atypical rows, tells me what portion of their atypicality is accounted for by which columns. In other words, what's atypical about this row?

I'm not sure how this should be requested in bql -- suggestions welcome. Perhaps EXPLAIN TYPICALITY where ...?

I'm not sure how best to implement it. My first thought would be to ask, for each cell, the probability of the cell having that value given all the other cells, and to report "portion of weirdness" as [1 - P(cell | row - cell)] / \sum_{i \in row}[1-P( cell_i | row - cell_i ).

Initial plan is to write something to do this in Python first, then work it into the rest of the language.

fsaad commented 9 years ago

The metamodel interface delegates identification of atypical rows to the metamodel (see code here), so my inclination is that the explanation should be meta-model specific.

vkmvkmvkmvkm commented 9 years ago

Feras and I are working on revising the generator interface against which BQL primitives (e.g. dependence prob, typicality, etc) can be defined. I propose we defer revising typicality for 1-2 weeks so it can benefit from these revisions (and can serve as a test case for the extensibility of our implementation).

On Monday, July 6, 2015, F Saad notifications@github.com wrote:

The metamodel interface delegates identification of atypical rows to the metamodel (see code here https://github.com/mit-probabilistic-computing-project/bayeslite/blob/master/src/metamodel.py#L157), so my inclination is that the explanation should be meta-model specific.

— Reply to this email directly or view it on GitHub https://github.com/mit-probabilistic-computing-project/bayeslite/issues/77#issuecomment-118867794 .

gregory-marton commented 9 years ago

Typicality has been deleted. @fsaad, How should this now be re-cast? I still want to know why a row has low probability according to a model.