Open gregory-marton opened 9 years ago
The metamodel interface delegates identification of atypical rows to the metamodel (see code here), so my inclination is that the explanation should be meta-model specific.
Feras and I are working on revising the generator interface against which BQL primitives (e.g. dependence prob, typicality, etc) can be defined. I propose we defer revising typicality for 1-2 weeks so it can benefit from these revisions (and can serve as a test case for the extensibility of our implementation).
On Monday, July 6, 2015, F Saad notifications@github.com wrote:
The metamodel interface delegates identification of atypical rows to the metamodel (see code here https://github.com/mit-probabilistic-computing-project/bayeslite/blob/master/src/metamodel.py#L157), so my inclination is that the explanation should be meta-model specific.
— Reply to this email directly or view it on GitHub https://github.com/mit-probabilistic-computing-project/bayeslite/issues/77#issuecomment-118867794 .
Typicality has been deleted. @fsaad, How should this now be re-cast? I still want to know why a row has low probability according to a model.
I'd love to have a language feature that, after I find atypical rows, tells me what portion of their atypicality is accounted for by which columns. In other words, what's atypical about this row?
I'm not sure how this should be requested in bql -- suggestions welcome. Perhaps EXPLAIN TYPICALITY where ...?
I'm not sure how best to implement it. My first thought would be to ask, for each cell, the probability of the cell having that value given all the other cells, and to report "portion of weirdness" as [1 - P(cell | row - cell)] / \sum_{i \in row}[1-P( cell_i | row - cell_i ).
Initial plan is to write something to do this in Python first, then work it into the rest of the language.