lumen-org / modelbase

A SQL-like interface for python and the web to query probabilistic machine learning models and its data.
GNU Lesser General Public License v3.0
4 stars 1 forks source link

Data local predictions for PyMC3 models #72

Open jong42 opened 5 years ago

jong42 commented 5 years ago

The facet for the data local predictions does not work for the ProbabilisticPyMC3Model class yet. There is no error message or anything, just nothing appears on the screen when ticking the box for this facet.

nandaloo commented 5 years ago

This is not a bug but a effect of the current way how ProbabilisticPyMC3Model answers density queries. The data local predictions facet causes the model to be conditioned on (a set of) particular values. Conditioning in ProbabilisticPyMC3Model is implemented as data slicing. For conditions on single values the slice is almost always empty. Hence the query returns nan for the prediction values. Hence nothing is (by convention) shown in the frontend.

nandaloo commented 5 years ago

To fix this, the ProbabilisticPyMC3Model class could instead use a KDE over the posterior samples. It does use KDE but only creates it when queried for a density value. Condioning however is not done on the KDE but on the posterior samples (which are then again input for any following density query).

jong42 commented 5 years ago

The reason for this behaviour is that the model is conditioned on point values of the training data. Conditioning on a point value however leads to discarding every posterior sample, since no sample has exactly this value. We could fix this by conditioning on the kde model instead of the pymc3 samples. This would mean that during the _fit() method, instead of just generating the samples we would have to learn the kde model.

jong42 commented 5 years ago

Unfortunately, the kde model class currently uses the exact same principle for conditioning and marginalizing: It slices the training data, and then afterwards computes a kde object on the remaining data.