probcomp / crosscat

A domain-general, Bayesian method for analyzing high-dimensional data tables
http://probcomp.csail.mit.edu/crosscat/
Apache License 2.0
322 stars 42 forks source link

Revising algorithm for computing joint pdf #65

Open fsaad opened 9 years ago

fsaad commented 9 years ago

Currently predictive_probability is computed by invoking the chain rule on the legacy simple_predictive_probability. @axch suggests an alternative implementation

On second thought, crosscat should have a better algorithm for doing this:

  • Group the query columns by view
  • For each view that appears
  • Compute the cluster logps the way simple_predictive_probability does
  • For each cluster
    • Compute the sum of the component_model.calc_element_predictive_logp_constrained across relevant component models and add it to the cluster logp
  • Return the logsumexp of all the above.

In other words, retain the structure of simple_predictive_probability_unobserved (mutatis mutandis for observed) but expand it to handle multiple columns.

The reason this should be OK is independence of columns given cluster assignments.

The present implementation can be retained as a test, possibly at the Bayeslite level: the logpdf_joint of any metamodel should respect the chain rule exactly as computed here.

fsaad commented 9 years ago

Code referred to as "here": https://github.com/probcomp/crosscat/commit/950787f1a2e812763d7447469d3827e6ce1495f4#diff-1c75fbbdfb035344508f37d7dce685b5R74