Closed leocasarsa closed 7 years ago
In addition to the rendering above, can you please provide minimal working examples of the analysis scripts used to generate them? I wonder if the same qualitative behavior (roughly one view, one large noisy cluster) remains when varying the primitive CGPM used to model the variables between Bernoulli
, Categorical
, and Normal
. There might an issue in the implementation of the collapsed samplers for one (or both) of the former two.
Investigating further, it seems to be related to the difference in the hyperparameter grids for Categorical
(and Bernoulli
) between cgpm-crosscat
and lovecat.
In lovecat
, the alpha_grid
is log-spaced from 1
to len(dataset)
.
https://github.com/probcomp/crosscat/blob/master/cpp_code/src/utils.cpp#L421
In cgpm
, the alpha-grid
is log-spaced from 1 / len(dataset)
to len(dataset)
.
https://github.com/probcomp/cgpm/blob/master/src/primitives/categorical.py#L110 (Categorical)
https://github.com/probcomp/cgpm/blob/master/src/primitives/bernoulli.py#L113-L116 (Bernoulli)
For the Beta prior, small values of alpha
and beta
force the distribution to look like a "smile", driving the parameters to extreme values of 0 and 1.
Re the minimal working examples, sorry for skipping that message. I can provide you with the code in an hour if you still need it.
I found the scripts at:
https://probcomp-2.csail.mit.edu:8883/notebooks/animals_cc_experiment.ipynb
and am reproducing the tests cases locally with potential fixes applied.
Migrating in from Slack
It turns out the original experiments run by @leocasarsa used a Bernoulli
for cgpm
and categorical
for lovecat
(which does not have a Bernoulli component model). The difference in the states ultimately fell down to this fact; running inference on the animals
dataset using a dirichlet-categorical
rather than beta-bernoulli
results in indistinguishable posterior states to lovecat
.
It's due to the way that CrossCat implements the dirichlet-categorical -- basically it forces a symmetric-dirichlet, so the beta-bernoulli sampler (which allows the beta hyperp0arameters alpha
and beta
to be arbitrary i.e. not the same) is not a special case of the dirichlet-categorical and the plots produced above were two samplers with essentially different priors.
Rerunning all the experiments on the animals dataset using (i) normal
and (ii) categorical
component models produces qualitatively similar posterior samples using cgpm
and lovecat
in both cases.
Here is the dependence probability matrix (left cgpm
, right lovecat
) using categorical
component models with 900 iterations of analysis, and the same row/column ordering:
This test case has been committed to https://github.com/probcomp/cgpm/blob/master/tests/graphical/animals.py
The artifacts produced are too large for Github, but we should save the .engine
files and plots using Git LFS or something similar.
It would be worth us additionally developing some intuition about why the asymmetric-dirichlet for the categorical prior results in dpmm-like posterior samples, as oppose to the cross-cutting partitions generated by the symmetric-dirichlet.
All renderings are saved in:
https://probcomp-2.csail.mit.edu:8883/tree/out
cc
stands for cgpm-crosscat.