probcomp / crosscat

A domain-general, Bayesian method for analyzing high-dimensional data tables
http://probcomp.csail.mit.edu/crosscat/
Apache License 2.0
322 stars 42 forks source link

simple_predictive_sample_unobserved draws randomly even for constrained columns #34

Closed riastradh-probcomp closed 9 years ago

riastradh-probcomp commented 9 years ago

If you constrain column 3 to be 42, and sample column 3, Crosscat draws randomly instead of returning 42 as one might expect. This results in strange results in bayesdb like:

bayeslite> SIMULATE Murder FROM states_cc GIVEN Murder = 1 LIMIT 4;
Murder
-------------
2.96562662884
9.17781629692
4.15232993703
2.88644682395
riastradh-probcomp commented 9 years ago

There is a candidate fix in the 20150924-riastradh-drawconstraint branch, to force a particular value and not just a particular cluster from which to draw in simple_predictive_sample_unobserved, with an OK from @axch. What remains:

I suspect it doesn't matter for observed rows, because we never ask PREDICT x GIVEN x = 0 expecting it to give 0; in fact bayeslite PREDICT explicitly omits any current value for the requested column in the constraints. But I'm not sure there is no application of simple_predictive_sample_observed for which you might pass a constraint that you're asking about.

riastradh-probcomp commented 9 years ago

20150924-riastradh-drawconstraint now addresses simple_predictive_sample_observed with an essay arguing why this is not sensible and inviting arguments to the contrary before we reject such queries.

riastradh-probcomp commented 9 years ago

Fixed in v0.1.29, abc491c56be9ea639210c2b0c2aeec0aef4de372.