Closed paciorek closed 10 months ago
@paciorek The logic makes sense - and the location of the added code seems ok, also.
My only possible holdup is that, with this change, we can have the situation where some samplers are including PP nodes in dependency calculations, while simultaneously other samplers are excluding PP nodes. Can this lead to incorrect inferences? I'm not 100% certain either way, but it seems very plausible to me that having part of the model (or, updating of part of the model) operate conditional on PP nodes as data, and other parts of the model being updated without conditioning on PP nodes could lead to incorrect inferences.
Is a confident answer to that ("yes, it's ok", or "no, there's in fact a problem with that") obvious to you?
Very good point - I should have thought of that and thanks for thinking carefully about this. I'm not confident.
One possibility is to check in configureMCMC
whether any CRP nodes are present and there are any predictive nodes and then turn MCMCusePredictiveDependenciesInCalculations
on. I think we should then leave it on, and tell the user, since we want any use of addSampler
to also have MCMCusePredictiveDependenciesInCalculations
be TRUE.
However, I think that is probably too clever (in that there are probably cases where it wouldn't be clever enough), and we should simply error out at the beginning of configureMCMC
and tell the user if they want to set up an MCMC on such a model they should set MCMCusePredictiveDependenciesInCalculations
themselves and rerun configureMCMC
.
Thoughts?
@paciorek This is a bit of a rabbit-hole, because the root cause issue was unequal-sized groups of CRP cluster nodes triggering an error, which itself was caused by some CRP dependencies being data and some not being data (and hence being PP; and thereafter excluding the non-data-PP nodes caused unequal groups).
So the situation seems more complex:
I've just sat for a minute, and tried to think about what's reasonable behavior here. These are the best thoughts I came up with, which are not extremely helpful:
MCMCusePredictiveDependenciesInCalculations = TRUE
may resolve this problem"@danielturek thanks for helping me think this through; this is helpful.
I agree that my proposed solution is conservative in that it may not be needed in some cases. So maybe the solution is to refine my proposed trapping behavior so that configureMCMC
looks for (1) a model with a dCRP node and (2) if there are any predictive nodes that are children of the "clusterNodes" (the nodes that are cluster parameters). If so, error out with a message saying that the MCMC might be feasible if the user sets the nimbleOption.
As far as what induces what, we really are expecting that there are equal group sizes and that the likely way that wouldn't happen currently is if some of the dependent nodes for a given group are predictive nodes. Other situations such as you mention are possible in principle but would require unusual model structures and should (fingers crossed) cause errors to be produced via all the checking in sampler_CRP
setup code.
What do you think of my revised proposal?
@paciorek This makes good sense to me. So:
configureMCMC
,MCMCusePredictiveDependenciesInCalculations = TRUE
This seems reasonable to me. Do you want to make this change?
Yes, I will do so.
This sets predictive nodes to be included in dependencies for sampler_CRP, which for logistical reasons and to ensure validity of sampling, imposes some requirements in terms of the number of dependent nodes for each stochastic index being sampled.
@danielturek a quick eye in terms of where I am setting the nimble option would be helpful.