Closed marcoct closed 8 years ago
There is a semantic logic to this problem. Imagine a model with an unknown number of continuous latents, like an uncollapsed DPMM with continuous cluster parameters. One might imagine computing the global posterior at any given configuration of parameters, but what does that actually mean? The number is a probability density, that must be interpreted with respect to something like counting measure on all the discrete variables and Lebesgue measure on the continuous ones. What would it mean to compare two such numbers, if the number of clusters is different? Formally, the "probability" of a (infinite-precision) configuration with more clusters is infinitely smaller than the "probability" of one with fewer. But the density in both cases is a finite number, and in fact, the information of how many Lebesgue-measured quantities were involved is lost by the time that number is presented.
What should global_log_joint
to do in this circumstance?
(Note: global_log_likelihood
remains sensible, because it's only the observations whose base measure is relevant, and that's presumably the same across all points of comparison.)
global_log_likelihood
remains sensible
Well except when it already isn't. observe if (flip()) { uniform_continuous(0, 10) } else { uniform_discrete(0, 10) } = 5
Despite the measure issues, it is possible to adapt the code to compute this quantity (each random choice can report its log probability (density) given its parents, plus global log likelihood) anyway, with a large warning in the documentation for this method indicating that results may be difficult to interpret or compare in certain cases?
Having this quantity, in addition to the global-log-likelihood, will help us to understand how our KL divergences and probability ratios compare to this quantity, and the semantic issues with this quantity may even be an important for highlighting the importance of using a technique like our KL-based measurement technique instead of log joint as a debugging tool.
Request: Create a second version of global_log_joint
that has a scary sounding name like unsafe_global_log_joint
(to discourage casual use) that doesn't crash when it encounters measure-theoretic problems.
The structural problem is that regen
and detach
autmatically repropose brush from from the prior, and do not compute the weight.
Ways this request could be satisfied anyway:
regen
and detach
to accept other kinds of proposals
(e.g. the constant "use the current values") and compute their
weights. This would be the most principled way to do it, but I
don't want to architect it on a tight schedule.regen
and/or detach
to accept a flag to just evaluate the
prior density, please. I kinda don't want to do this, because
regen
and detach
are subtle and covered in hacks already.getGlobalLogScore
method of
lite.Trace
and new_cxx.Trace
(the latter defined in pytrace.cxx
).Issues
logDensityOfData
method. So we can just invoke that on AAA
nodes, and count 0 logDensity for nodes whose operator is an AAA
node.Details, if adopting the custom proposal plan
getGlobalLogScore
method with a more specific (and
perhaps cautionary) name.getGlobalLogScore
.
getGlobalLogScore
should probably just be updated.getGlobalLogScore
as an inference SP named
something like unsafe_global_log_joint
.
engine_method_sp
.@riastradh-probcomp, could you implement this? If the above discussion is not enough to go on, we can talk about it.
Fixed by #572.