probcomp / crosscat

A domain-general, Bayesian method for analyzing high-dimensional data tables
http://probcomp.csail.mit.edu/crosscat/
Apache License 2.0
322 stars 42 forks source link

Speed up inference (and ensure ergodicity) in the presence of ENSURE DEPENDENT #76

Closed axch closed 7 years ago

axch commented 9 years ago

by block proposing the dependent column cliques.

Apparently the current implementation strategy for ENSURE DEPENDENT is to still propose column moves one at a time, but zero out the probability of any that violate the constraints. This means that a column that is DEPENDENT on another can never change views.

The better proposal mechanism is conceptually simple: just propose moving such a collection of columns as a group. Actual implementation difficulty is unknown.

fsaad commented 7 years ago

Looking into this issue. Thoughts:

Adjusting the CRP probabilities

Consider the following example with three customer c1, c2, and c3 with the constraint ensure independent (c1,c2):

Scenario A: sample in the order (c1, c2, c3)

Pr[c1=1] = 1 sample -> [[c1 =1]]

Pr[c2=1 | c1] = 0 (by independence constraint) Pr[c2=2 | c1] = 1 (probabilities are normalized) sample --> [[c2=2]]

Pr[c3=1 | c1,c2] = 1/(2+a) Pr[c3=2 | c1,c2] = 1/(2+a) Pr[c3=3 | c1, c2] = a/(2+a) sample --> [[c3=3]]

==> Implies that the probability all customers are on separate tables is a/(2+a).

Scenario B: sample in the order (c3, c1, c2)

Pr[c3=1] = 1 sample --> [[c3=1]]

Pr[c2=1 | c3] = 1/(1+a) Pr[c2=2 | c3] = a/(1+a) sample --> [[c2=2]]

Pr[c1=1 | c2, c3] \propto 1 Pr[c1=2 | c2, c3] = 0 (by independence constraint) Pr[c1=3 | c2, c3] \propto a sample --> [[c1=3]]

==> Implies the probability all customers on separate tables = a/(1+a) * a/(1+a)

Someone interested in modeling can develop a ``constrained CRP'' model which tries to formalize independence constraints, but for the time being (and due to the title of this ticket) it makes sense to worry about the ENSURE DEPENDENT case which has a straightforward resolution.

Patching the inference kernel for block proposals

The heavy lifting will happen in State.cpp. https://github.com/probcomp/crosscat/blob/25fad43cc08422314068c26e44c2c087bec07c17/cpp_code/src/State.cpp#L236-L244

fsaad commented 7 years ago

Resolved by https://github.com/probcomp/crosscat/commit/3873eee2f3d32fddfe63e6fd01417b7f44fe38b9