lumen-org / modelbase

A SQL-like interface for python and the web to query probabilistic machine learning models and its data.
GNU Lesser General Public License v3.0
4 stars 1 forks source link

Simplify sampling when independent variables are not changed #73

Open jong42 opened 5 years ago

jong42 commented 5 years ago

Previously it was possible to set an arbitrary number of posterior samples, so drawing without replacement made sense if the number of posterior samples was higher or lower than the number of training data points. However, after that an error was discovered that could appear in certain models if the sizes of training data and posterior samples differed (see commit ed00453e38e7ea911105c93502342c3cd6aed093). Since then, the number of posterior samples was fixed to the size of the training data. This makes some of the previous strategies redundant. When we do not change the independent variables anymore, the whole workaround with the shared variables could be dropped

nandaloo commented 5 years ago

I understand that the sampling could now be simplified, but I do see a serious problem with limiting the number of posterior samples in such a drastic way. The quality of almost every query against the posterior distribution depends on the number of posterior samples, as this is the only way of accessing the distribution. We really should find a way to allow arbitrarily many posterior samples.