nimble-dev / nimble

The base NIMBLE package for R
http://R-nimble.org
BSD 3-Clause "New" or "Revised" License
156 stars 23 forks source link

add option for ASIS sampling to RW sampler? #1437

Closed paciorek closed 4 months ago

paciorek commented 5 months ago

This modifies the RW sampler to allow for joint sampling of the target along with random effects whose mean is the target or whose sd is the target, i.e., a joint sampler for either mu, b[1:p] or sigma, b[1:p] where b[i] ~ dnorm(mu, sd = sigma).

The sampler adds sampling in the noncentered parameterization, assuming that the model is set up for centered parameterization and samplers are already being applied in the centered parameterization. In particular the sampler proposes mu* (or sigma*) and then deterministically sets b[i]* to be b[i] - mu + mu* or mu + (sigma*/sigma) (b[i] - mu). The Jacobian (in the latter case) is omitted because the code omits calculation of the b[i] prior from the Metropolis ratio, which the Jacobian would cancel with.

This is a special case (but covers much of the cases envisioned in the paper) of the ASIS sampling strategy of Yu and Meng (2011; 10.1198/jcgs.2011.203main). Hence naming the option 'ASIS'. That said, it's not very informative, but other options would probably be wordy (e.g, "joint_with_random_effects")

The main question here from an API perspective is whether we want this as a control option to the RW sampler. It is only relevant for some models and might be confusing to users. However, most of the code is identical to the RW sampler, so having as a separate sampler would involve a bunch of code duplication. @perrydv @danielturek, I would like feedback from you on this question.

Additional notes: 1.) TBD how much checking we do to ensure the random effect distribution is valid vs. telling the user what it should be and expecting them to check. 2.) It would be straightforward to extend to the case where b[i] ~ dnorm(mu + eta[i] + gamma*z[i], sigma) and the target is any of mu, gamma, or sigma. 3.) Adding an ASIS option to other scalar samplers, in particular slice, is straightforward (and I have draft code for it). Not sure if we want to do that too.

We are currently using this sampler to explore sampling in MSOMs with species-level random effects.

I cancelled CI testing for this as there is no testing code and in principle this has no effect on existing use of RW sampler.

danielturek commented 5 months ago

@paciorek Thanks for tagging me on this.

To address your highest level design questions upfront:

Another design option to consider: perhaps only having one control argument, called includeRE. If it has the default value of FALSE, operation is the usual RW sampler. If includeRE is provided as TRUE, then we automatically do some model processing and checking, and self-determine asisType and asisEffects. And, if includeRE = TRUE but the model structure is not amenable to this, we throw an error in the setup function. I feel I somewhat prefer this, to make it (much) more automated. Discussion is welcomed, I know it reduces fine-grain-user-control, but it feels reasonable to me in this case.

Prior to line 256 (subsetting ccList$calcNodesNoSelf) would it be wise to force asisEffects through expandNodeNames)? When asisEffects is user-supplied, I imagine it would usually be only a variable name re, or perhaps indexed as re[1:n].

Would likes 256 and 257 be more clear (and perhaps more efficient) if written using setdiff ?

Also, it seems line 257 might have a typo. As currently written, wouldn't this give a warning about non-multiple vector lengths? Was it intended to be:

ccList$calcNodesNoSelf <- ccList$calcNodesNoSelf[!ccList$calcNodesNoSelf %in% asisEffects]

I agree a lot of the checks should be added, per the comments in the current code. Perhaps also add:

## TODO: check asisEffects none are data (quite defensive)
## TODO: check asisEffects all come from the same declID (perhaps more important)

The second of those, above, seems important to cover some non-standard cases, e.g. some of the random effects come from a different declaration for example:

for(i in     1:r)   b[i] ~ dnorm(target,                  sd = whatever)
for(i in (r+1):n)   b[i] ~ dnorm(target + something_else, sd = whatever)

or one could imagine where the mean is target in both cases, but the sd arguments are different: whatever1 and whatever2.

Documentation at the end of MCMC_samplers.R should be updated, to include the Yu and Meng (2011) reference.

I would please request some formatting changes, to be consistent with other code/samplers, but that can come later (and I'd be quite happy to make those changes).

Could a test be added, to demonstrate the correct asymptotic posterior is achieved?

paciorek commented 5 months ago

@danielturek thanks for this very useful and detailed feedback.

On the question of automated checking, I agree this would be nice. However, I am wary of (a) being confident about catching all the special cases given our past experience in analogous situations and (b) the extra work of doing that given other priorities. My thought was that we describe to users when the sampler is appropriate and move some of the burden onto the user, albeit with some double-checking.

danielturek commented 5 months ago

@paciorek What you said makes sense; I certainly understand your point that "in the past, we're never able to anticipate and handle all use-cases; users always come up with something we didn't imagine".

My only lingering hesitation, is this approach makes the new option a big step more annoying to use. It would be so much nicer to write:

conf$addSampler("mu", "RW", includeRE = TRUE)

In the end, I'm fine either way.

paciorek commented 5 months ago

A middle ground might be to allow your example input but warn user in docs and, specifically, in a printed note what the sampler assumes, particularly for things we don't fully check. Let me think on that particular aspect more.

perrydv commented 5 months ago

Hi @paciorek @danielturek I am catching up here. A few thoughts:

paciorek commented 5 months ago

Here's a variation on some of the things under discussion.

We could have a distinct sampler, perhaps: sampler_joint_effects or sampler_noncentered_effects or sampler_joint_RE. In its setup we then have various options for checking and what not. In run it just calls out sampler_RW. (We could also allow a user to ask for various types of samplers on the top node, e.g. RW or slice, if we want a version of this that does slice sampling for top nodes.)

The actual sampling could done in our current sampler(s), in particular sampler_RW. Now that would mean that, as I have in the current PR, sampler_RW has some code blocks that are new and only for this use case. I don't see a way of avoiding that given the code blocks are embedded in the sampler. But if we have the user interface go through a separate sampler, then I think it's clearer to users what the sampler is, and we don't need to modify documentation for our core samplers. Of course we could also start out with the embedded sampler being a duplicate of sampler_RW but with the extra blocks and then combine into sampler_RW at a later date if avoiding code duplication seems like the right thing to do after further consideration. For the moment, actually, maybe that is the way to go.

To @perrydv 's point about correctness, yes, agreed and very good point. Where I am leaning at the moment is that we put in all the calculations to ensure correctness but then if we can in a simple fashion check and be confident that we are in a situation where we can leave calculations out, then we leave them out. I will also see, in the CDFW case, how much time is added to do the extra calculations - it may well be minimal.

paciorek commented 4 months ago

Based on this discussion and an offline discussion with Perry, here's what I've done:

Not yet done: writing documentation for sampler_noncentered and setting up tests.

paciorek commented 4 months ago

Ok, one update here. If we want to have a single sampler_RW that has regular sampler and allows the noncentered variation, we'd need a nimbleFunctionList to handle the fact that in run code we need to call getParam(node, 'mean') inside an if statement that is not active when using standard RW sampler.

At the moment this has tipped the balance into having our current sampler_RW and a new sampler_RW_noncentered and similarly for slice sampler. There will be a bunch of duplicated code. Could be revisited in the future.

paciorek commented 4 months ago

@perrydv @danielturek (@kenkellner in case you're interested given we've used this in the CDFW comparisons), I've cleaned up the noncentered sampler, adding roxygen and testing, and fixing somethings revealed in testing.

I think this is close to finished. The roxygen deserves some attention as I spent some time trying to describe the sampler carefully, but more eyes would help.

@danielturek please weigh in now on what formatting changes you'd like.

danielturek commented 4 months ago

@paciorek I looked over this carefully, and it looks great. This will be a nice addition.

perrydv commented 4 months ago

@paciorek I am taking a fresh look at this as we run up to a release.

perrydv commented 4 months ago

Also, how about shortening control list arguments:

paciorek commented 4 months ago

I've added a dynamic check for presence of a mean parameter.

As far as simpler names, we can't use type as that would conflict with the type='noncentered' if the user passes control args via .... I'll change to sampler and param.